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FOREWORD 


The  1968  Army  Numerical  Analysis  Conference,  sponsored  by  the  Army 
Mathematics  Steering  Committee  (AMSC) ,  was  held  on  25-26  April  1968  at 
the  U.S.  Army  Electronics  Command,  Fort  Monmouth,  New  Jersey.  This 
meeting  was  the  seventh  in  this  series  of  similar  meetings.  The  main 
objectives  of  said  conferences  are:  (1)  the  exchange  of  information  of 
interest  to  the  scientific  and  technical  staffs  and  also  the  technical 
managers  of  the  Army  "other  than  business"  computer  and  (2)  to  provide 
the  AMSC's  Subcommittee  on  Numerical  Analysis  and  Computers  with  informa¬ 
tion  regarding  the  Army's  current  needs  in  numerical  analysis  and  related 
fields  of  mathematics. 

Mr.  A.D.  Hall  of  Bell  Telephone  Laboratories  was  the  first  of  the 
three  invited  speakers  to  give  his  address.  His  topic  was  "An  Intro¬ 
duction  to  ALTRAN".  The  other  two  guest  speakers  were  Mr.  Gordon  Sande 
from  Princeton  University  and  Professor  Andries  van  Dam  of  Brown  Uni¬ 
versity,  Respective  titles  for  their  addresses  were  "The  Fast  Fourier 
Transforms  —  Survey  of  Selected  Applications"  and  "Computer  Graphics". 

In  addition  to  these  informative  talks  there  were  eight  interesting 
papers  contributed  by  Army  scientists. 

Mr.  Leon  Leskowitz  served  as  Chairman  on  Local  Arrangements.  Those 
in  attendance  would  like  to  thank  him  for  his  efforts  in  their  behalf. 

Not  only  did  he  provide  excellent  local  accommodations,  but  he  also 
organized  a  large  portion  of  the  program. 

Dr.  John  H.  Giese,  Chairman  of  the  Conference,  has  asked  that  the 
Proceedings  of  this  meeting  be  published  and  issued  to  interested  Army 
scientists.  He  wishes  to  take  this  occasion  to  thank  all  the  speakers 
for  their  very  interesting  papers  and  the  various  chairmen  for  their 
help  in  the  conduction  of  this  meeting. 
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INVERSION  OF  HANKEL  AND  TOEPLITZ  MATRICES 


Choong  Yun  Cho 

Maggs  Research  Center,  Watervliet  Arsenal 
Watervliet,  New  York 

1.  Introduction.  In  this  paper,  inversion  of  the  Hankel  and  Toeplitz 
matrices  is  demonstrated  in  two  ways.  In  the  first,  we  decompose  a 
general  matrix  into  triangular  factors  expressed  in  determinantal  forms . 
The  inverse  of  these  factors  is  also  obtained  in  determinantal  forms. 

The  results  are  then  applied  to  the  Cauchy  matrix,  as  well  as  the  Hankel 
and  Toeplitz  matrices,  and  explicit  expressions  for  the  factors  are 
obtained. 

In  the  second  approach,  we  show  that  the  triangular  decomposition 
of  the  inverse  of  the  Hankel  matrix  yields  a  system  of  orthogonal  poly¬ 
nomials.  Then  the  theory  of  continued  fractions  and  the  q^d  algorithm 
for  the  triangularization  of  the  inverse  of  the  Hankel  matrix,  are 
utilized. 

A  comparison  with  a  standard  algorithm  for  the  inversion  of  the 
Hankel  matrix  is  made  by  using  the  well-known  Hilbert  matrix,, 


The  Hankel  matrix  of  order  (n+1)  is  of  the  form 

p  ^ 

C  C  -  « « •  c 

o  1  n 

C1  C2  Cn+1 

H  = 

C  C  , -  • • •  Ck 

L  n  n+1  2n  J 

=  j=0 >  1  > 2 , « « •  >n) ) 


The  remainder  of  this  paper  was  reproduced  photographically  from  the  author’s 
manuscript. 
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Matrices  of  this  form  are  closely  connected  to  the  moment 
problem  on  the  real  axis  and  also  arise  as  coefficient  matrioes 
of  the  normal  equations  in  problems  of  least  squares  pplynomiptl 
curve  fitting. 

The  Toeplitz  matrix  of  order  (n+l)  has  the  form 


*o  a-l 


‘-n 


ai  ao  •••  a-n+l 


®n  *n-l  **•  ac 


■  i  i*  J-0|l|2|.,.,n)) 


such  matrices  arise  in  the  trigonometric  moment  problem. 

The  Toeplitz  matrix  is  closely  connected  with  the  Hankel 
matrix  and  in  numerical  computation  of  the  inverse,  the  algorithm 
for  the  Hankel  matrix  can  be  carried  out  exactly  in  the  same  way 
for  the  Toeplitz  matrix. 

To  be  more  specific,  we  consider  a  Toeplitz  matrix 
T  =  ((a^j  :  i,J*0,l,2,...  ,n))  of  order  (n+l)  and  put  an..n"bw» 

m  *  0,1,2, ...  ,2n,  then  the  matrix  T  becomes 

...  bj  b0 
...  bj  bx 


[^b2n  •••  bn+l  hnj 

or 


2 


bo  bx 


•  •  • 


1 


Denote 


T  =  H*E,  vbere  E  is  the  elementary  matrix,  E 


and  H  a  Hankel  matrix  ((bi+j  :  i,J*0,l,2,. . . ,n)). 


-1  -1 

Then  we  have  T  =  E‘H  . 

That  is,  the  inverse  of  Toeplitz  matrix  can  he  obtained  by 
inverting  the  corresponding  Hankel  matrix  and  rearranging  its 
rows. 


2.  Triangular  Decomposition  of  a  General  Matrix 

We  first  develop  the  triangularization  of  a  general  square 
matrix  in  this  section  and  it  will  be  applied  to  the  Oauchy 
matrix  in  the  following  section. 

It  is  convenient  to  use  the  following  notation  for  the 
general  square  matrix  A  of  order  k: 


A  ■  (aj  b2  c3  •••  vk_^  vk) 


3  - 


tfe  denote  the  determinant  of  thia 
Thus 


I  bld3h4  I  “  d«t 


matrix  by  |  a^jC,  .,. 

d, 

d3  h3  . 

<U  hAj 


It  is  known  (Turnbull  [^5  J  ,  p.  369)  that  for  the  usual 
triangular  de compos iti on  A  ■  LU,  the  elements  of  L  sad  U  cam 
be  expressed  in  terns  of  determinants  involving  the  elements  pf 
A  as  follows  (it  is  assumed  that  the  decomposition  is  possible 
and  L  has  been  chosen  to  have  units  in  ths  principal  diagonal): 


1 

I  aal 
I  ai' 


1  aal 

laibal 

1 

lail 

laibal 

earn 

laml 

1*1^1 

laib2cml 

...  1 

lail 

lalb2l 

lalb2c3l 

lakl 

•  •  • 

i*Acl 

|aib2ckl 

laxbac3dk| 

l*x' 

laibal 

l*ibs«»l 

|axbac3d4i 

u 


and 


Kl 

lcil 

M 

N 

laibaj 

l»lC2{ 

Mai  ... 

|«i»a| 

l»il 

l»xl 

I*ll 

I.J 

|aibaca| 

|*ibadsl 

Mai 

Mai 

Mai 

|*iba 

l*lb*  ••*vk-l  I 

It  can  be  shown  (see,  Cho  [Y]  )  that  L*1  and  U**  can 
similarly  be  expressed  explicitly  as  follows i 
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1 


laa| 

“lai| 

1 

|a2b3l 

Ialb3l 

1 

laibal 

laiba| 

la3b3Cj 

jaxb3°*  1 

hib^4| 

Iaiba03l 

1 *iba°» 1 

l*ib«P»| 

k-1  1  aab3«**vk  1  k  J  »ab3c4...Y1(  j 
|  a1b3.,.vk-1j  ^  I  aibac3»»«Tk-]j 


(3) 


6 


•  •  • 


3.  Triangular  Decomposition  of  touchy  Matrices 

Recently  Schechter  [2  ]  ,  and  (apparently  independently) 
Trench  and  Scheinok  [  U  ]  have  shorn  the  elements  of  the 
inverse  of  a  touchy  matrix  may  be  written  dowi  in  simple  closed 

form. 

From  the  new  determinantal  expressions  that  hav*  been 
developed  in  Section  2,  we  now  can  express  the  triangular 
decomposition  of  the  touchy  matrix  explicitly,  and  furthermore 
that  the  same  is  true  of  the  inverse  of  these  triangular  factors. 
We  denote  the  touchy  matrix  of  order  k 


t 

1 

i 

1 

ftx+px 

«7f. 

Ik 

1 

i 

• 

1 

c  • 

• 

h 

• 

1 

1 

k*A 

‘V/& 

V/ 

by 


I  m,  n 


1,2 » • • • »k) )• 
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Theorem 


Given  the  Cauchy  matrix  of  order  k,  C  -  ((  1  I  *# 

<W» 

n  =  1,2, . . .  ,k.) ) ,  where  (X  x  ,  0<a  •  •••*  * 

“  [3  2  *•••»  -  are  pairwise  distinct  and  C  ■  LU.  Then  the 

triangular  factors  are 

«VA>  f4  (l) 

V,n’  «X„y»)’U 

1  >1  («-«><&•£> 

“m-n  "  (<*♦/?„>  ’U  <£V/?n)(°V/?.)  (al 

Proof:  (see,Cho£lJ  ), 

Theorem  2 

The  elements  of  the  matrix  inverses  of  L  and  U,  where  C  *  LU» 
are  given  by 


u’1). 


(U‘l). 


m  1 

TT 

n-1 

T~T 

t-l  (<Xn-«t» 
t/n 

1  1 

8-1 

<‘V/A> 

n-1 

it 

n 

i  I 

s-l 

<°VA> 


(»*»)  (3) 


(CX  -£*  ) 

n  • 


(»<n)  Ik) 


U.  Hankel  and  Toeplitt  Matrices 

In  two  special  cases  it  turns  out  that  the  Cauchy  matrix 
reduces  to  one  of  Toeplitt  fora:  These  occur  vhen 

i 

1 

Ta  ■  ((  —  }  a,  n  •  1,2,...  ,k)) 

a  -  n  ♦  x 

where  x  is  a  constant,  and 


Ta-  (( 


1 

— - —  l  n,  n  •  1,2,... ,k)) 

p  -  q 


where  p,  q,  and  x  are  constants. 

In  the  first  case  we  obtain  readily  L,  U,  L  ,  end 
if1  by  putting  *  a  tad  An  ■  x  -  n.  The  invent  becomes 

k 

TT  (  x  ♦  t  -  a  )(  *  -  t  ♦  n  ) 

-1  t“1 

(V  )«,n - “k - k - 

(  n  -  a  ♦  x  (  a  -  r  (••*) 

r*l  »ml 

r/m  s/n 


In  the  second  case  we  note  that 


where  D  is  a  diagonal  matrix  with  d^  ■  qn  ,  n  ■  l|2,.«.,fc. 

The  first  factor  of  Ta  now  has  the  Cauchy  form,  i.e., 

»d  /3n  ■  pqn  ,  and  the  triangular  factors  are 
readily  expressed.  The  inverse  of  Ta  ia 

k 

T1  (pqm  -  qx4t)(pqt  -  q**n) 

-i  t-i 

(Ta  )  -  - — 

a  '«,*»  k  k 

7*  |  \  (qr  -  q")  •]  f(qn  -  q*) 

r-1  a-1 

r/m  a/n 


where 


„ _ k  kx-x+2m  „k-l  kx+m+n 

y  s  pi  -  p  q  • 


The  Hankel  forms  corresponding  to  the  above  Toeplits 
matrices  Tj  and  Ta  are 

Hx  a  ((  — 1  .  ;  m,  n  ■  l,2,...,k)) 

n  +  n  ♦  x 

and 

Ha  =  ({  -  i  n,  n  ■  l,2t...,k)) 

8  ^  m+n+x  ' 

p  -  q 

respectively. 


The  triangular  factors  of  Hx  and  Ha,  and  their  inverses 


are  Immediately  obtained  by  ehooeing  0(a  and  i#  tb# 

theorems.  In  particular,  the  inverse  of  Ha  is 


(h;1) 


*,n 


TT  (m"*  -  -  «.*•"> 

fl  _ — 

j  •  IT  (q‘r  -  <f“)*TT  if  -  q*) 

r-l  •*! 

r/n  •/» 


(7) 


where 


pkqk*-x-2«  .  pk-l^x-m+n  # 


5.  The  Hankel  Matrix  In  Connection  With  The  Theory  of  Continued 
Fractions  and  Orthogonal  Polynomials . 

In  the  previous  section  we  had  obtained  the  triangular 
factors  and  the  inverse  of  the  Hankel  and  Toeplit*  matrices  by 
triangulari zation  of  the  Cauchy  matrix.  This  was  accomplished 
using  the  results  for  a  general  matrix  expressed  in  terms  of 
determinantal  forms. 

We  now  explore  the  LU-deconpoaition  of  the  inverse  of  the 
Hankel  matrix  from  a  somewhat  different  point  of  view. 

Suppose  we  have  decomposed  a  given  Hankel  matrix  H  into 
the  LU  form  where  L  is  unit  lower  triangular  matrix,  i.e., 
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H  ■  LU 

or  L-1  H  -  U 

Denoting  L”1  ■  K,  we  have 

KH  -  U 

The  requirement  that  the  product  KH  be  upper  triangular 
imposes  (  r+  l)  conditions  on  the  (  r+  l)th  row  of  KH,  i«e«, 
r 

^kraW°  t  -  0,1,2,.  ...r-l 

3-0 

kr.  °..r  *  0  (w> 

8-0 


(1) 

(2) 


where 
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Nov  v«  define  a  (linear)  functional  Z  luch  that 


t  ■  0|1|2|«m| r*l 
t  ■  r 


(9) 


Since  t  and  r  are  dummy  indices,  we  may  write 

l{pt(x)  Pr(x )]■  =  5tr*dt  (  dt  :  constant)  (10) 

where  S  ia  Kronecker  delta, 
tr 

In  other  words,  suppose  that  from  the  (  r+  l)th  row  of  the 
triangular  matrix  K  =  L  defined  in  Eq.  (4),  we  fora  the 
polynomial 

r 

PrW  ■  X  kr, 

a-0 

Then  if  we  can  find  a  linear  functional  I  such  that  Eq.  (5) 
is  true,  this  system  of  polynomials  is  orthogonal  in  the  sense 
of  Eq.  (10). 

It  is  a  well-known  fact  (Tchebyehef;  Stieltjes)  that  a 
power  series 

cs 

8+1  (ID 

X 


I 


may  be  expressed  as  a  continued  fraction,  the  "corresponding" 
continued  fraction  which  has  the  form 
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'o  *1  ei  4a 


(12) 


X-  1-  X-  1-  X- 

The  even  part  of  this  fraction!  ths  "associatsd"  continued 
fraction  is 


A  A 


x  -  c*-  X  -  C*  -  x  -  <v  - 
o  i  a 


where  the  connection  between  the  CXr  ,  j$r  ,  and  the  er  ,  qr 
is  explained  later. 

It  is  also  well  known  that  both  the  numerators  and  denomi*- 
nators  of  the  successive  convergents  of  the  associated  continued 
fraction  form  systems  of  orthogonal  polynomials  (see,  e.g. , 
Shohat  and  Sherman  ^3  J  ;  Wynn  [7J  ).  The  construction  of  the 
quantities,  ef  ,  qr  ,  o(r  ,  and  is  eased  by  the  use  of  the 
q-d  algorithm,  which  we  will  describe  shortly. 

Denote  the  rth  convergent  of  the  associated  continued 
fraction  by  or(x)  /  Pr(x)  ,  then  pr(x)  is  a  polynomial  of 
degree  r  and  indeed 


PrU)  -  krB  xB 

s»0 


(13) 


where  k  are  the  elements  of  the  matrix  K. 

rs 

In  short,  for  a  given  Henkel  matrix 


H  =  ((  Ci+j  •  i»  i  *  0,1,2,...)) 

the  system  of  orthogonal  polynomials  may  be  constructed  via 
the  associated  continued  fraction.  This  can  be  carried  out 
by  the  q-d  algorithm.  The  process  then  leads  to  the  matrix 
K,  which  is  the  inverse  of  the  unit  lover  triangular  factor 
of  the  given  Hankel  matrix  K. 

Furthermore, 

H  «  LU  *  LDLT  (H  eymetric) 

and  H*"1  =  (lVW1 

or  H”1  =  KT  D"1  K  (14) 

where  D  is  a  diagonal  matrix  with  elements  dy  ,  r  ■  0,1,2,...  . 

The  three-term  recursion  formula  for  the  polynomials  pr(x)  is 

Pr(x)  ■  (x  -  °^r_1)  Pr-1(*)  ~  pr-2^  *  U5) 

r  *  2,3** • • 

Multiplying  both  sides  by  x  and  applying  the  functional 

I,  ve  have 

lf*r"2  PrU>}  -  If*1"1  -  °r-l  1{1‘r"2  Pr,l(x)} 

1{x**  pr-2(x)}  •r'2>3*- 
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This  gives 


1{*r_1  J>r-i(=c>}  ‘  /Sr-2  1{*r'2  Pr-2U)} 

1{*r"1  rr-l(x)}  ■  /Sp-2  /Sr- 3  ^  3  pr-3,x1} 


*  coTT  ft 


r-2 


s-0 


r  »  2,3,*..  •  (16) 


Since  dQ  *  c  (  ve  have  obtained  a  formula  for 
obtaining  dr  ,  r  -  0,1,2,...  .  We  see  readily  that 

d1^2  r  »  0.1,2,...  ,  are  the  normalization  factors  for  the 

t  •  P  f 

polynomials  p  t(x)  . 

.  (+) 

6.  Quotient-Difference  (q.— d)  Algorithm 

(I)  In  the  previous  section  ve  considered  the  power 

(+)  We  base  our  exposition  upon  two  papers  of  Wynn  (£6j,  |[7j) 
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series 


8*0 


We  could  also  consider  the  power  series 


oo 

x 

8*0 

obtained  from  (l)  by  removing  the  first  m  terms  and 
multiplying  the  result  by  x*0. 

This  is  equivalent  to  taking  an  arbitrary  square  section 
of  the  Hankel  matrix.  To  indicate  this  general  situation,  we 
write  e£m^  and  q^,°^  for  er  and  qr  in  Section  5, 

The  power  series  (2)  may  now  be  expressed  by  the 
corresponding  continued  fraction 


cm+s 


B+l 

X 


(m)  (m) 

ei  ^a 


IM 


x-  1- 


X-  1-  X- 


The  coefficients 


Am) 

© 

r 


(m*0,l,2,. . i ;  r-1,2,...) 


are  obtained  by  means  of  the  4-d  algorithm  relationships  vhich 
were  discovered  in  principle  by  otieltjes. 

The  quantities  q<m)  .  e‘">  are  pieced  in  the  follolng 

two  dimensional  array 


20 


by  the  following  relationships 


(ra)  (m) 

We  compute 


(m)  (m+l)  (m)  (a+l) 

*r  qr  "  qr  +  er-l 

(nr+l)  (m+l) 

.(a)  s  ^  - 


(4) 


(5) 


Hr 

(m) 

er-l 

(m) 

with  the  initial  values  e.  =  0 

(m=l |2 | • • • ) 

q(za)  *  . 
H1 

cm+l 

cm 

( m=0 | 1 | • • • ) 

(6) 

The  even  part  of 

the  continued  fraction  (3)  is 

cm 

(a)  (a) 

1  ql 

•  •  • 

(m)  (m) 

e  a 

r  *r 

mm  •  •  • 

(7) 

~4">-  -4"1  -  .<■*! 

_ (a)  (a) 

**Vl  “  er  1 

or 

Cm  i 

5‘B) 

t  •  • 

/j(m) 

P  t-2 

MM 

(6) 

(m) 

x-*  - 


where 


x-  o( 


(m) 


(m) 


x-  OC. 


(m) 

r-l“ 


fit 


(a) 


f(a) 


(m)  (a)  *j  \ 

er-l  qr-l  »  *r-l 


( m= 0 11 12 1 • • • ) 

(a)  (a) 

\  +  er-l 

(mxO|l|iM  l  r»2,3n  i • ) 


(9) 
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The  rth  convergent  of  expansion  (8)  may  be  written  as 


the  quotient  of  two  polynomials 


o(n)(x) 

r 

P^U) 


(10) 


where  pv  ;(x)  is  of  degree  r  and  may  be  written 
r 


(m)  -L.  (m) 

pr  U>  ’  Z  * 
8*0 


(11) 


(m), 


and  o'  (x)  is  of  degree  (r-l)  so  that 
r 


o(B)(x) 


.<■>  • 

Z  l*.*x  • 


(12) 


s*0 


(m) 

The  polynomials  p'  (x)  form  an  orthogonal  system  in 

r 

the  sense  that 


yk<» 

8*0  r’' 


_  CTnit+fl  "  ®  (t«0,lt2*»« •  »r-l) 

s  m+t+s 


(13) 


while 


yk(B)c  t  o . 

/  r,s  m+r+s 


(1U) 


8*0 


The  Inequality  (14)  is  rendered  definite  by  imposing  that 

k(m)  -  1  (r-0,1,2,...)  .  (15) 

r,r 
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Then  the  polynomials  p^(x)  are  the  denominators  of  the 

r 

convergents  of  expansion  (8);  they  satisfy  the  three-tarn 
recursion  formula 


(m) 

P„  (x) 
r 


,  (a)/  v  ^(a)  (m) 

U  “  *r-l)  Pr-1  X  “  /*r-2  V2(x) 


(16) 


with 


(r»2,3,...) 


(a),  . 
P„  (x) 
0 


=  1  . 


p‘n)(x) 


x  -  o( 


(a) 


U7) 


If  the  coefficients  e^  ,  {3^2  are 

then  the  polynomials  p^(x)  may  be  constructed  by  the 

r 

recursion  formula  (16). 

Relationships  (l6)  and  (17)  may  be  expressed  in  terms  of 
(xn) 

the  coefficients  k  .  m=*0 ,1,2,. . . , 

rfs 


ku),kui  _*<»><»)  _*<“)>>  m) 

r,s  r-1,8-1  r-1  r-l,s  /  r-Z  r-2,s 


k(»)  ,  ku>  .  * 

rtr-l  r-l,r-2 


(s=l,2,. . . »r— 2 ) 

(m) 

r-1 

( r=2 (3 i • • • ) 


Ja)  -  /y(a)  J®)  V  (°) 

r,0  "  "  ^r-l  Kr-1,0  “  y#r-2  K  r-2,0 

(r«2,3»...) 


(19) 


(a)  . 

with  the  initial  values  k.  ■  -  c  /  c 

1 1 U  2&TX  B 
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(m)  I'm)  _/(m) 

We  have  seen  that  the  coefficients  q^  ,  e^  ,  ^ 


and  A  '  ’  may  be  obtained  by  the  q-d  algorithm,  i.e.,  by 
r  r-2 

means  of  relationships  (4),  (3),  (6)  and  (9). 

With  regard  to  the  construction  of  we 

have  another  method  at  our  disposal  -  an  orthogonal  procedure, 
which  is  equivalent  to  the  triangular  decomposition  of  a  given 
Hankel  matrix. 

Denote 


f 

/  r,s  mfr+s+1  r 

s»0 

k  -  1 
r,r 

(r-0,1,2,...) 

then  the  three-term  recursion  formula  (16)  yields 


(ei) 


2ii 


0  -  d 


<■>  *<■)  » 


r-1 


l .  e. 


:tnd 


(m)  m 

r-2 


-  fi 

*r-l 


r-2  ar-2 


.<■) 

dr-2 


(r*2|  )|«m) 


i.e. 


0  gr-l  ’  «r-l  dr-l  "  /Or-2  gr-2 

K(m)  - 

(m)  .  gr-l  Pi 


0( 


^(»)  -(■) 
r-2  gr-2 


or 


0( 


r-1 


("0  . 
r-1 


.(m> 

OC  o 


or 


(to) 

H,r" 


.(to) 

dr-l 

(m) 

gr-l 

(to) 

gr-2 

.(") 

dr-l 

(to) 

dr-2 

(r-2,1,...; 

,  we  have 

(m) 

’  *1,0 

• 

(m ) 

quantity  r  by 

r 

E 

3*0 

k(»)  _ 
r,s  ra^t*s 

r 

E 

3*0 

Kr,8  cm^r*s 

r 

E 

g"0 

.(to) 

Kr,s  Sn+t+a 

(m) 


(r*0,l,2,.., ,t-l) 


(22) 

(23) 

(2U) 

(25) 

(26) 

(27) 


(20) 


<29) 
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Then  in  particular 


h(m)  - 

hr*l,r 


and  equation  (26)  becomes 


(r-0,1,2,...) 


h 


(m) 

r-l,r-2 


(i*"2*  3>».  • )  • 


It  can  be  shown  that 


.(m) 

®t  *  cm+2t 


with 


-m 


(m 

t, 


r-0 

(t-1,2,...) 


and 


t-1 

.  (m)  .("0  (m) 

kt,s  "  “  2—t>  ^t,r  kr,s 
r**s 

(s»0,l,2,...,t-l). 


Thus  equations 
procedure. 

Starting  with  (i.e. 


(23)  to  (3U)  offer  the  following 


t»0) 

.  (m)  .  .  .  (»)  . 

1  h0,0  1  i  k0,0  1 


(30) 


(31) 


(32) 


(33) 


(3M 


recursive 


26 


we  form  (t*l) 


cm*l  /  ^ 


k 


(m) 

1,0 


e  >> 
cm*2  “  ^  * 


and  obtain 


a  (m)  .(»)  ,  .0") 

P  0  dl  /  d 0 

O/  ("»)  .  .  U) 

0  ”  kl,0  (Equation  (27))  . 

For  t*2  we  have 


An)  ,  .(m) 

h2,0  *  cm«-2  /  d0 


/  ,  (m)  *  §  _(m) 

(cm>3  *  kl,0  cm+2  )  /  dl 


k 


(m) 

2,0 


(n)  (m) 

h?,l  ki,o> 


we  obtain 

oC 


(m) 

.  »>> 

2,1 

-  h2,l 

,("0 

-  «  j(m) 

2 

°m+U  "  d0 

(n) 

.  v,(m) 

1 

h2,l  "  hl,0 

(m) 

1 

■  4b)  / 1‘"> 

.  h 


2,0 


(n)  (m) 

1  *  h2,l 


We  repeat  the  steps  for  t"3,U,,,,,n  • 
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(II)  To  summarize,  we  denote 


where 


k 


(m) 

0,0 


.  ("») 

An) 

kl,0 

kl,l 

• 

An) 

• 

• 

(m) 

kn,0 

kn,l 

.  (m) 

h0,0 

An) 

An) 

hl,0 

hl,l 

• 

• 

(m) 

• 

An) 

hn,0 

hn,i 

k<"> 

Kn,n  J 


>) 

hn  .n 


(36) 


(39) 


.(*) 

do 


.(»*) 


(UO) 


(m) 

r,s  • 


An) 

hr,s 


and 


,(m) 

dr 


are  defined  in  the  preTioue  paragraph. 
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For  a  given  Hankel  matrix  of 'order  n+1 


cm+l  C™*n 

cm*2  *’*  cm*n+l 


cm*n  cm-*n+l  *  ’  *  Cm+2n 


m 


expressions  (35),  (36),  (37),  (W)  and  <W  "»7  *•  asumsibled 
as  the  matrix  equation 


In  fact 


and  we  have 


K(m)  H(m)  K(m)T  .  D(m) 

(4«) 

K(m)  L(m)  .  L(m)  K(m)  .  j 

(43) 

H(m)  .  L(m)  D(m)  L(m)T 

(44) 

H(m)"3  .  K("»)T  D<m)  K(m) 

(45) 

Relationship  (Ul)  Is  the  triangular  decomposition  of 
Hankel  matrix  (41)  • 

in  term,  of  th.  quantities  d‘m)  ,  4"’  th*  e'n,r*1 
-1 

of  is  given  by 


■  Z 

r-max(i,J) 


(m)  .(mf1 
kr,i  V 


(46) 
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and  in  terms  of  the  quantities 
Riven  by 


(■ 


n 


rs*ax(i,J) 


<■> 


(0  ^n) 


Thus  the  groups  of  equations  (23)  -  (3h)  in  conjunction 
with  (U6),  and  equations  (h) ,  (5),  (6),  (9),  (16),  (17)  in 

conjunction  with  (U7),  offer  two  algorithms  for  the  Inversion 
of  the  Hank  el  matrix  *  an  orthogonal  procedure  and  a 

q-d  algorithm,  respectively. 


An  actual  comparison  for  the  algorithms  will  be  demonstrated 


in  the  following  paragraph. 


(Ill)  a  Comparison  Between  Exact  ant}  Computed  Result* 


These  closed  formulas  enable  us  in  the  given  cases  to  corapar* 

the  exact  values  of  (H’^)  with  those  arising  from  machine 

*■»  J 

computation  based  upon  Gauss  elimination  and  the  two  methods 
described  in  the  previous  sections. 

Since  the  Hilbert  matrix  is  a  form  of  Hankel  matrix,  ((c^  • 
l/(m+l)j  m-0,1,2,...)),  it  is  interesting  to  compare  our  methods 
with  Gauss  method  (e.g. ,  Crout  reduction  with  pivoting)  for  the 
inversions  of  l*xl*,  5x5,  6x6,  7x7,  8x8  and  9x9  matrices  by  a  CDC 
1601*  computer.  In  the  following  table  we  show  the  three  oorner 
elements  of  each  inverse,  i.e.,  (0,0),  (0,n)  and  (n,n)  elements, 
obtained  by  three  different  algorithms.  Throughout  the  compu¬ 
tation  single  precisions  were  used. 


TABLE 


Row  (A) 

q-d  algorithm 

Row  (B) 

Crout  reduction 

Row  (C) 

orthogonal  procedure 

Row  (D) 

exact  value 

Order 

(0,0) 

(0,n) 

(n,n) 

1*  (A) 

1.6000000162  El 

-1.1*000000260  E2 

2.80000001*18  B3 

(B) 

1.6000000512  El 

-1.1*000000838  E2 

2.8000001360  E3 

(c) 

1.6000000275  El 

-1.1*0000001*07  E2 

2.8000000565  E3 

(D) 

1.6000000000  El 

-1.1*000000000  E2 

2.8000000000  E3 
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Order 

(0,0) 

(0,n) 

(n,n) 

5 

(A) 

2.50000051*58  EL 

6.3000032021*  E2 

1*.  1*100018881*  El* 

(B) 

2.5000012051*  El 

6.  3000067790  E2 

1*.  1*1000  38862  El* 

(C) 

2.5000005115  El 

6.3000027626  E2 

l*.  1*100011*959  El* 

(a) 

2.5000000000  EL 

6.3000000000  E2 

l*.  1*100000000  El* 

6 

(A) 

3.6000183765  El 

-2.77203791*50  E3 

6.9855180101*  E5 

(B) 

3.6000366371  El 

-2.7720752723  E3 

6.985599991*8  E5 

(c) 

3.60001*97287  El 

-2.7721155771  E3 

6.9857099602  E5 

(D) 

3.6000000000  El 

-2.7720000000  E3 

6.9851*1*00000  E5 

7 

(A) 

U. 900 322571 3  El 

1.2011*192921*  El* 

1.130051*31*12  E7 

(B) 

U. 90105 3 3816  El 

1.201981*3971*  El* 

1.1105073673  E7 

(c) 

1*. 90185 31675  El 

1.2025989916  El* 

1.110951*1*587  E7 

(D) 

U. 9000000000  El 

1.2012000000  El* 

1.1099088000  E7 

8 

(A) 

6.1*0271*83768  El 

-5.151*1127631*  El* 

1.7681321*615  E8 

(B) 

6.1*29501*31*33  El 

-5.2301722659  El* 

1.7895337516  E8 

(c) 

6.30391*72898  El 

-1*.  8101*553005  El* 

1.6501*227  376  E8 

(D) 

6.1*000000000  El 

-5.11*80000000  El* 

1.7667936000  E8 

9 

(A) 

8.2689617719  El 

2.  381*169381*6  E5 

3.01*58808566  E9 

(B) 

9.2825392880  El 

3.1*1*29869353  e5 

1*.  1271688201  E9 

(C) 

6.3366261800  El 

1.167510671*3  E5 

1*.  1711 366  759  E9 

(D) 

8.1000000000  El 

2.1879000000  e5 

2.8158273000  E9 
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7*  Two  Special  Cases 


In  Section  1*,  we  obtained  explicit  expressions  for  the 
inverse  of  two  special  matrices,  Ha  and  Ha  (T*  and  Ta),  through 
the  triangular!  zation  of  the  Cauchy  matrix.  We  now  show  the 
results  via  the  q-d  algorithm  and  will  see  where  these  matrioes 
are  derived  from. 

We  have  shown  in  Eq.  (U7)  of  Section  6  that 


n 


r-raax(i,J) 


(m) 

s 


(1) 


(0  *  i,j  *  n) 

where  is  of  order  (n+1). 

Wynn [7]  obtains  by  the  q-d  algorithm  closed  expressions 
for  e£m)  ,  q£m)  ,  and  of  the  following  Hankel  matrices  1 


Hi  « 


'  c0  -  1 

cx(o<+l)  ...  (c*+ri-l) 

m  KOT+1)  ...  (r+m-1) 


(m»l,2. 


and 


Ha  t 


/.  #  Wl  «x+lN  /,  «+m-lx 
(A-q  )(A-q  )  ...  (A-q  ) 

V  w.  Tf+1*  Tf+m-1* 

(C-qg  )(C-q  )  ...  (C-q  ) 


(m*l,2. 


where  A,  C,  q,  cxt  and  V  are  constants. 


For  Hj.  we  have 


(m) 

r,s 


(-D 


r-s 


«:>  < 


«  +m*r-l 
r-s 


(  If  +ra+2r-2  j 


(2) 


r-s 


(m-0,l,2,...j  r-l,2,...j  s-0,l,...,r-l) , 


(m)  (r+l)(  ^-<x+r)(o<  ♦m+r)(  y+m+r-1) 

M  ■  ■  —  i  .■■  ■.■  .  ■  ■■■■■■  i 

r  ( If  +m+2r-l)(  ^f+m+2r)^(  ^T+m+2r+l) 


(3) 


(r-0,1,2,...) 


and  for  H2 

(m) 


(r-s)(r-a-l) 


k\—/  /  i  \ r-a 

r,s  *  (-1)  Q 


(l-<ir)(l-qr-1)...ri-qr-"1) 


(W 


(l-q)(l-qJ)...{l-q“) 

(A-q*™« )  (A_q  .  (A_q« 

•  -  -  — -  '  'I-  "  -  -  -  ' 

(C-q  T  (o-q  ^  *n"r*“) . . .  (c-qir-ra’’ar-2) 

(m"0,l,2,...  j  r-l,2,..,j  s-0,l,..,,r-l), 
/,  r+lw_  cm  .  7f+rw.  CK+m*rw-  Tf+m+r-1* 

(»)  .  m+2r  (1-1  KCq  -M  KM  )(C-q  )  ( 

r 


If  ♦m+2r-lWr,_/>  -f  ♦m+2r  j  2^C-q  If  ♦m*-2r<-l 


(C-q 0  m  £r_J')(C-q 

(r“0>l»2,..») • 


1 


,(i») 


-1 


Introducing  these  expressions  in  Eq. (1),  we  may  obtain  Hv 
in  closed  forms,  i.e.,  in  terms  of  simple  products  and  exponential 
factors.  It  is  a  curious  fact,  and  one  of  the  results  of  our 
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investigation,  that  these  scalar  products  can  be  simplified 
only  when  If  ■  o<  ♦!  and  A  ■  C,  namely, 


»•••*  a)}  , 


and 


Ha 


((  ©m 


A-q 


A-q 


cx+m 


}  m-0,l,2,...,n))  • 


The  inverse  becomes 


and 


n+i  n+j 

TJ  (oC+r)  JJ  («♦.) 

r-i  s-j 

m  — . . . .  ■■■■■■■.  i 

i!  J!  (n-i)».  (n-J)J  cx(oc*i*j) 
(0  ‘  i,j  ‘  n). 


respectively. 


TT  (A-qW+1+t)(A-qa+^+t) 

t-0 


An  q"“  (A-^HA-q0^)  ff  CqV)  ff'lV) 

r«0  s*0 

iyi 

(0  ‘  i,j  <  n), 


We  mentioned  in  the  introduction  that  there  is  a  simple 
connection  between  Henkel  end  Toeplits  matrices)  this  enables 


immediately  to  derive  the  inverse  of  the  following  Toeplits 
matrices  of  order  (n+l)t 

T>  • «  \  ■  -zrtr  i  -n  ‘ k  ‘ »»» 


Ta  ■  ((  *k 


I  -n  *  k  *  n)) 


where  A,  q,  and  Oi  are  constants. 
The  inverse  becomes 


2n-i  n+j 

(-l)n  i+*^  |  |  (o<-n+k)  1  \  (oc-n+s) 
k-n-i  s-J 


i!  y.  (n-i);  (n-d)i  <X(*-iM) 


(0  <  i,j  <  n) 


(^a  )•< 


f[  (A-qC,<-i+k)(A-q^-n+J+k) 

k-0 

r  f[  («n'V>  ft  <*V) 

r-0  8-0 

r/n-i  s/j 


where  y  -  Anqn^  ’n^  (A-q06  )  (A-q*  ”i+*^ ) 
(0  *  ij  *  n). 


respectively, 
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REMARKS  t 


1*  Note  that  Ha  and  Ta  b>  some 

Ha  — »  Hj_ 

Ta  — >  Tx 

in  the  limit  ae  q  — >  1  and  A  ■  C  ■  1, 

2.  In  Section  2  through  Section  U,  element*  of  the 
matrices  are  indexed  as  (alx,  a12l  )  instead 
of  (sqq,  a0l,  ...  )  which  is  more  conventional 
for  the  Hankel  and  Toepliti  matrices.  The 
formulas  appear  to  be  different  in  two  conventions, 
but  they  are  readily  modified  from  one  system  to 
another. 

3.  Throughout  this  paper,  all  the  empty  products  in 
expressions  are  assumed  to  be  unity. 
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FOURIER  PROCESSING  SHORTCUTS 
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Fort  Monmouth,  New  Jersey 


INTRODUCTION 

1 

The  advent  of  the  Fast  Fourier  Transform  (FFT)  makes  it  possible  to 

2 

rapidly  transform  or  correlate  a  large  volume  of  data.  The  available  FFT 
1,3 

algorithms  accept  as  input  a  sequence  of  discrete  complex  numbers.  When 
the  data  to  be  Fourier  processed  consist  of  real  points,  direct  use  of  the 
FFT  algorithm  is  wasteful  of  computer  storage  space  and  time.  A  technique 
will  be  described  which  processes  certain  redundant  types  of  data  with  a 
savings  of  half  the  time  and  half  the  computer  storage  required  for  direct 
application  of  the  FFT  algorithm.  The  technique  does  not  require  modifi¬ 
cation  to  available  FFT  algorithms.  Typical  of  the  Fourier  processing 
operations  which  can  be  performed  at  substantial  savings  are  crosscorrelation, 
autocorrelation,  convolution,  deconvolution  and  two-channel  Fourier  trans¬ 
formation.  The  theory  behind  these  techniques  is  presented  in  the  first 
part,  and  their  implementation  is  detailed  in  the  second  part. 

THEORY 

The  Fast  Fourier  Transform  is  a  rapid  implementation  of  the  finite 
discrete  Fourier  Transform,  or  FDFT.  The  FDFT  of  a  sequence  of  real  data 
points  is  a  complex  function  with  symmetry  in  its  real  and  imaginary  parts. 
This  symmetry  may  be  used  to  effect  shortcuts  when  working  iflrfth  data 
sequences  which  are  either  real  or  have  real  FDFT's. 

This  paper  was  photographically  reproduced  from  the  author's  copy. 
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Before  demonstrating  this  symmetry,  let  us  review  some  properties  of 
the  Fast  Fourier  Transform  algorithms.  These  algorithms  assume  that  the 
data  sequence  forms  one  cycle  of  a  periodic  function.  Consequently,  the 
Index  of  the  data  sequence  must  be  considered  as  module  N,  where  N  Is  the 
number  of  points  in  the  sequence.  The  number  of  transform  values  returned 
by  the  FFT  algorithms  Is  the  same  as  the  number  of  input  points.  The 
input  points  are  considered  to  be  equispaced  in  each  dimneslon.  In 
general,  the  data  points  are  assumed  to  be  complex  numbers.  This  last 
assumption  of  FFT  algorithms  causes  the  FFT  to  be  slower  than  necessary 
when  a  real  data  sequence  is  to  be  transformed. 

The  symmetry  properties  for  continuous  Fourier  Transforms  are  well 
known.  Those  for  the  FDFT  are  similar.  If  a  real  fund  ton  Is  placed  in 
the  real  part  of  a  complex  function,  and  If  the  imaginary  part  is  set  to 
zero,  then  the  transform  of  the  complex  function  will  be  Hermitian 
symmetric.  The  real  part  of  the  transform  will  have  even  symmetry  while 
the  imaginary  part  of  the  transform  will  have  odd  symmetry.  If  the  real 
function  had  been  placed  in  the  Imaginary  part  of  the  complex  function, 
and  the  real  part  set  to  zero,  then  the  transform  would  again  be 
Hermitian  symmetric,  except  that  now  the  real  part  would  have  odd 
symmetry,  and  the  Imaginary  part  would  have  even  synmetry.  These  symmetry 
properties  are  discussed  in  detail  in  Appendix  A. 

If  the  complex  function  had  been  composed  of  two  real  functions,  one 
each  In  its  real  and  Imaginary  parts,  then  we  see  that  the  transform  will 
no  longer  be  symmetric.  However,  the  real  and  imaginary  parts  of  the 
transform  may  each  be  separated  into  their  even  and  odd  components.  This 
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process  is  called  symmetrization.  Since  we  know  where  each  symmetry 

component  originated,  we  can  reconstruct  the  transforms  of  each  of  the 

two  real  functions  we  started  with.  The  process  of  obtaining  two  spectra 

4,5 

with  one  transform  has  been  described  in  the  literature,  and  is  here 
called  two-channel  Fourier  transformation.  (See  Appendix  B)  But  much 
more  can  be  done. 

The  Fourier  processing  operations  listed  in  the  introduction  can  be 
performed  in  the  transform  space,  and  involve  only  point  by  point  multi¬ 
plications  or  divisions  of  the  transforms  of  two  real  functions.  We 
have  seen  that  all  the  components  of  each  transform  are  available  after 
two-channel  Fourier  transformation.  These  components  may  be  arithmetically 
assembled  to  yield,  on  a  successive  transform,  the  result  of  the  desired 
processing  operation.  Since  this  result  is  a  real  function,  its  transform 
will  have  the  symmetry  previously  described.  Therefore,  the  second  trans¬ 
form  can  be  a  two-channel  one,  and  can  produce  the  results  of  two  processing 
operations  at  once.  This  results  in  substantial  stviigf.-over  direct  appli¬ 
cation  of  the  FFT  algorithm.  When  two  processing  operations  are  carried 
out  at  once,  only  two  transform  operations  are  required,  while  direct 
application  of  the  algorithm  to  each  real  function  requires  six  transform 
operations.  The  time  for  direct  application  may  be  reduced  to  four  trans¬ 
form  operations,  but  the  computer  storage  requirement  then  doubles. 
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The  Fourier  processing  shortcut  Is  two  to  three  times  faster,  and  uses 
up  to  half  the  storage  required  by  direct  application  of  the  FFT.  Note 
that  no  modification  to  the  FFT  algorithm  Is  required  but  that  the  Increase 
in  speed  Is  a  result  of  the  symmetry  of  the  transform  of  real  functions. 

The  central  idea  in  the  following  explanations  is  that  the  FDFT 
algorithm  causes  the  symmetry  components  of  a  complex  function  to  follow 
fixed  paths,  so  that  a  careful  choice  of  the  terms  put  at  the  start  of  each 
path  will  result  in  the  desired  processing  operations  at  the  end  of  the  path. 
The  paths,  or  symmetry  properties,  are  listed  below,  and  demonstrated  in 
Appendix  A.  The  properties  are: 

A.  real  even  function 

B.  real  odd  function 

C.  imaginary  even  function 

D.  Imaginary  odd  function 
The  arrow  indicates  that  the  FDFT  algorithm  applied  to  the  type  of  function 
at  its  tail  produces  a  function  of  the  type  at  its  head.  An  imaginary 
function  is  defined  here  as  a  function  whose  values  are  /T  times  a  real 
number. 

An  example  of  the  application  of  the  symmetry  properties  to  two-channel 
Fourier  transformation  is  given  in  Appendix  B. 

In  the  next  section,  the  Fourier  Processes  are  discussed. 


— ►  real  even  function 
— ►  imaginary  odd  function 
— »  imaginary  even  function 
— >  real  odd  function 
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FOURIER  PROCESSES 


The  term  Fourier  Process  refers  to  the  following  series  of  mathe¬ 
matical  operations.  First,  two  real  functions,  X  and  Y,  are  each  Fourier 
transformed  to  obtain  their  complex  transforms,  X  and  y.  Then  X  and  Y 
are  arithmetically  operated  on  and  combined  to  form  a  single  complex 
function  Z  according  to  a  prescribed  rule.  Inverse  Fourier  transformation 
of  Z  produces  a  specfic  real  function  Q  which  is  the  end  product  of  the 
process.  The  nature  of  Q  will  depend,  for  the  most  part,  on  the  manner 
in  which  Z  is  formed  from  X  and  Y- 
To  summarize: 

Q  =  FDFT  [Z] 
where  Z  =  Z  [X,  Y] 

X  =  FDFT  [X] 

Y  =  FDFT  [y] 

X,  Y  are  real  functions 

FDFT  [-]  denotes  the  finite  discrete  Fourier  transform  of  the 
quantity  in  brackets. 

Figure  1  lists  the  three  basic  Fourier  Processes  treated  in  this 

paper.  Each  process  is  defined  mathematically  in  the  column  at  the  left. 

The  transform  operation  associated  with  each  process  is  listed  to  its 

right.  Thus  convolution  is  associated  with  complex  multiplication  of 

★ 

X  by  Y;  crosscorrelation  with  complex  multiplication  of  X  by  Y  (where  the 
symbol  *  over  a  variable  stands  for  complex  conjugate);  and  deconvolution 
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with  complex  division  of  X  by  Y.  In  the  last  case  multiplication  of  both 
numerator  and  denominator  by  Y*  rationalizes  the  fraction  X/Y. 

Since  deconvolution  Is  not  a  standard  Fourier  Process,  it  will  be 
explained  more  fully.  Deconvolution  Is,  in  a  sense,  the  inverse  of 
convolution.  X  Is  the  function  resulting  from  the  convolution  of  Q  with 

Y.  X  and  Y  are  both  known  and  Q  must  be  determined.  The  transform 
operation  required  to  do  this  is  complex  division  of  X  by  Y.  In  the 
event  that  Y  contains  one  or  more  zero-valued  elements,  the  process 
described  by  X/Y  cannot  be  accomplished  completely  and  consequently  Q 
will  be  only  partially  determined. 

PROCESSING  SHORTCUTS 

Generally,  Fast  Fourier  Transform  Algorithms  are  defined  in  terms  of 
the  operation  on  complex  data.  When  the  input  data  are  completely  real, 
as  they  are  in  many  cases,  this  restriction  is  wasteful.  It  has  been 
shown  that  a  second  real  function  can  be  inserted  into  the  otherwise 
blank  imaginary  part  of  the  Input  data  to  form  an  overall  complex  function 

Z.  (see  figure  2)  After  complex  Fourier  Transformation  the  separate 
complex  transforms  X  and  Y  can  be  separated  from  the  overall  transform 

Z  by  the  application  of  a  set  of  simple  arithmetic  formulae  (see  figure  3). 
These  formulae  are  based  solely  on  the  symmetry  properties  of  the  odd  and 
even  components  of  X  and  Y  (see  Appendix  A)  and  on  the  manner  In  which 
these  components  are  embedded  in  Z  (Appendix  B).  In  some  cases,  however, 
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it  is  possible  to  perform  the  required  operations  on  X  and  y  without 
explicitly  extracting  these  functions  from  Z. 

There  are  thus  two  basic  types  of  shortcut  techniques.  Both  types 
capitalize  on  the  fact  that  X  and  .V  are  completely  real  by  forming  a 
pseudo-complex  function  Z  from  them.  X  forms  the  real  part  of  Z  and  y 
the  imaginary  part.  However,  one  type  of  shortcut  enables  Fourier 
Processing  of  X  and  y  without  explicit  determination  of  their  transforms 
X  and  Y.  The  second  type  of  shortcut  extracts  X  and  Y  from  Z  and  operates 
on  them  directly. 

THE  BASIC  OPERATION 

All  of  the  shortcuts  produce  in  the  transform  space  the  results  of  the 
desired  multiplication  (or  division)  expressed  in  terms  of  the  even  and  odd 
symmetry  components.  For  example,  to  get  XY  in  the  transform  space,  the  follow¬ 
ing  must  be  done. 

XY  =  (Xe+  iX0)(Ye+  1Y0) 

XV  =  XeVVo'U2 
+1XeY0tiX0Ve 

«  ■  XeV  Wo 

+i(XeV0+X0V0) 

To  produce  XY,  put  the  even  components  in  the  even  part  oc  the  real  part 
of  the  transform.  These  terms  will  travel  to  the  even  part  of  the  real 
part  of  the  next  transform  (see  figures  2,  4).  Put  the  odd  components 
times  (-1)  in  the  odd  part  of  the  imaginary  part  of  the  transform.  These 
terms  will  be  multiplied  by  (-1)  and  then  will  travel  to  the  odd  part  of 


even  components 
odd  components 
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the  real  part  of  the  next  transform  (see  figures  2,  4).  Thus  the  term  Q 
of  figure  4  is  the  transform  of  XY,  that  Is  (see  figure  1),  Q  is  X  *  Y, 
the  convolution  function.  This  example  illustrates  the  basic  operation  of 
multiplication  In  terms  of  symmetry  components. 

SHORTCUTS  USING  IMPLICIT  TRANSFORMS 

Figure  4  illustrates  the  first  type  of  shortcut.  (X  *  Y)  Is  deter¬ 
mined  at  Q  without  extracting  X  and  Y  from  Z.  The  asterisk  between  X  and  Y 
denotes  the  convolution  operation.  The  operations  Indicated  take  place 

A 

after  the  initial  Fourier  transformation  of  Z  to  obtain  Z.  RE  and  IM  des- 

A 

ignate  the  real  and  imaginary  parts  of  Z  respectively.  RE 'and  IM'are  the 

A 

new  Z  resulting  from  the  designated  operations.  Even  and  odd  symmetry 
components  of  REAL1  (same  as  RE')  and  IMAGINARY1  (same  as  IM')  have  been 
grouped  separately  with  even  components  in  the  top  half  of  each  box  and 
odd  components  In  the  bottom  half. 

Close  Inspection  of  the  components  which  travel  to  Q  upon  the  second 
Fourier  transformation,  I.e.  the  upper  half  or  even  part  of  REAL'  and  the 
lower  half  or  odd  part  of  IMAGINARY'  identifies  these  components  with  the 

AA 

terms  resulting  from  the  product  XY  or  the  transform  operation  associated 
with  (X  *  Y).  Thus  the  second  transformation  brings  together  the  odd  and 
even  components  of  Q,  i.e.  the  convolution  of  X  with  Y,  in  the  real  part 
of  the  output.  The  components  which  transform  to  become  W  are  by-products 
of  the  entire  operation  but  they  are  Irrelevant.  Figure  5  Illustrates  the 
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crosscorrelation  process  by  a  similar  method,  the  difference  being  the 
initial  inversion  operation  indicated.  The  real  part  of  Z  that  is  RE  is 
inverted  immediately  after  the  first  Fourier  Transformation.  Inversion 
means  replacement  of  the  independent  variable  t  by  -t  ,  which  is  the 
same  as  rotating  Z  by  180  degrees.  Inversion  is  also  equivalent  to  multi¬ 
plying  the  odd  symmetry  component  of  a  function  by  (-1).  Since  the  odd 
symmetry  component  of  RE  is  the  imaginary  part  of  the  transform  Y,  in¬ 
version  here  is  equivalent  to  replacing  Y  by  Y*.  By  definition  this 
replaces  the  convolution  function  in  the  output,  Q,  by  the  crosscorrelation 

Figure  6  illustrates  another  shortcut  technique  making  use  of 
implicit  transforms.  The  task  is  to  convolve  two  independent  real  func¬ 
tions  X  and  Y  each  with  a  third  real  function  T.  T  has  the  very  convenient 
property  that  it  is  also  evenly  symmetric,  so  that,  as  a  result  T  will  be 
completely  real.  It  is  assumed  that  T  has  been  pre-calculated  and  stored. 
As  in  the  previous  examples  the  complex  function  Z  is  formed  from  two  real 
functions  X  and  Y.  After  Fourier  transformation,  both  RE  and  IM  are 
multiplied  by  T.  Now  RE'  and  IM'  are  mutually  exchanged.  Such  an  exchange 
is  one  way  of  producing  an  inverse  Fourier  transformation  except  that  the 
results  of  the  second  transformation  are  also  exchanged,  with  the  real 
part  containing  Y  *  T,  and  the  imaginary  part,  X  *  T. 

Figure  7  illustrates  the  deconvolution  process  carried  out  in  exactly 
the  same  manner.  RE  and  IM  are  each  divided  by  T  and  the  deconvolution 
functiors  result  in  the  output.  The  symbol  */  indicates  deconvolution. 


function. 
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SHORTCUTS  USING  EXPLICIT  TRANSFORMS 


Figures  8-11  outline  shortcut  examples  which  assume  explicit  ex- 

A  A  A 

traction  of  X  and  Y  as  an  initial  step,  (liefer  to  figure  3)  Once  X  and  Y 

A 

have  been  separated  from  Z,  they  can  be  operated  on  directly  to  produce 
any  Fourier  process  desired.  If  the  results  are  grouped  carefully  to 
form  a  new  Z,  successive  Fourier  transformation  can  result  in  two  meaning¬ 
ful  outputs  instead  of  one. 

Figure  8  demonstrates  simultaneous  convolution  and  crosscorrelation 

AAA  A 

by  working  with  the  extracted  Fourier  components  Xe,  Xo,  Ye,  and  Yo.  The 
components  grouped  to  form  Z  represent  both  XY  and  XY*.  The  components 
which  travel  to  form  Q  upon  successive  transformation,  including  of  course 

A  A 

the  (-1)  are  the  terms  resulting  from  XY.  Those  which  travel  to  form  W  are 

A  A 

the  terms  resulting  from  XY*. 

In  Figure  9  the  very  same  technique  is  used  to  produce  both  deconvo¬ 
lution  and  the  deconvolution  spread  function.  The  deconvolution  spread 
function  W  Is  defined  in  such  a  way  that  X  *  W  =  Q. 

Figure  10  combines  the  previous  examples  producing  both  deconvolution 
and  crosscorrelation  at  Q  and  W  respectively. 

Lastly.flgure  11  presents  the  relatively  simple  example  of  producing 
two  autocorrelation  outputs;  Q  the  autocorrelation  function  of  X.and  W 
the  autocorrelation  function  of  Y. 
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SUMMARY 

What  has  been  presented  in  this  paper  is  a  basic  technique  which 
allows  the  FFT  Algorithms  to  be  employed  most  efficiently.  All  of  the 
techniques  are  based  on  the  fact  that  the  input  data  are  completely  real 
and  the  resulting  complex  Fourier  transforms  contain  inherent  symmetry. 

It  is  this  inherent  symmetry  which  allows  the  transforms  of  two  real 
functions  to  be  joined  together  as  one  transform  operation  and  yet 
operated  on  as  though  they  existed  separately.  Also,  as  we  have  shown, 
it  is  sometimes  possible  to  perform  two  Fourier  processing  operations 
involving  the  same  two  functions  in  almost  the  same  time  and  space.  When 
the  shortcut*,  are  utilized  in  a  machine  program  processing *time  and 
memory  requirements  run  50  -  75$  less  than  straightforward  application 
of  the  FFT  Algorithms;  savings  which  can  be  quite  significant  when  large 
volumes  of  data  are  to  be  processed  and  either  computer  processing  time 
or  memory  space  is  limited. 
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APPENDIX  A  -  Demonstration  of  Symmetry  Properties 


The  symmetry  properties  will  now  be  demonstrated.  The  properties  are: 

A  -  The  FDFT  of  a  real  even  function  is  a  real  even  function, 

B  -  The  FDFT  of  a  real  odd  function  is  an  imaginary  odd  function, 

C  -  The  FDFT  of  an  imaginary  even  function  is  an  imaginary  even  function, 

D  -  The  FDFT  of  an  imaginary  odd  function  Is  a  real  odd  function. 

An  imaginary  function  Is  defined  here  as  a  function  whose  values  are  1 
times  a  real  number. 

The  notation  used  Is  this:  hatted  variables  denote  variables  in  the 
transform  (or  frequency)  domain,  while  unhatted  variables  denote  space 
(or  time)  domain  variables. 

Let  X(t),  t  =  0,  1,  2,  . .  N-l  be  the  sequence  to  be  transformed, 

and  let  X(t),  t  =  0,  1,  2,  . .  N-l  be  the  Finite  Discrete  Fourier 

Transform  (FDFT)  of  X(t). 

In  order  to  demonstrate  the  symmetry  properties  of  the  finite  discrete 
Fourier  Transform,  the  following  must  first  be  proved: 

1)  the  FDFT  of  a  real  function  is  hermitian  symmetric,  that  is,  if 

the  transform  is  inverted,  then  its  complex  conjugate  is  obtained.  Inversion 
here  means  rotating  the  function  by  180  degrees. 

2)  The  FDFT  of  an  inverted  function  is  itself  inverted. 

To  demonstrate  the  hermitian  symnetric  property,  replace  t  by  -t  in 
the  defining  equation  of  the  FDFT,  equation  1 
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X  ( t )  =  z  X  ( t )  exp  ( 2n  1  tt/N ) 

t-0 

N-l 

X(-t)  =  z  X ( t )  exp  (-2nitt/N) 

t=0 

If  X(t)  is  real , 


X(-t)  =[z  X(t)  exp  (+2nitt/N)l 
t-0 


X(-t)  =  [X(t)] 


eq.  1 

eq.  2 

eq.  3 

eq.  4 


where  *  denotes  the  complex  conjugate.  Equation  4  defines  a  Hermitian 
symmetrical  function. 

The  remaining  symmetry  properties  are  based  on  the  following  equation: 

X(-t)  =  FDFT  [X(-t)]  eq.  5 

i.e.,  if  X(t)  is  inverted,  X(t)  becomes  inverted.  This  can  be  shown  from 
the  definition  of  the  FDFT.  Replace  t  by  -t  in  equation  1. 


X ( - t )  =  Z  X ( t )  exp  ( -2n i tt/N )  eq.  6 

t=0 

In  order  to  place  the  right  side  of  equation  6  in  standard  form 
let  y  =  N-t,  so  that  X(t)  =  X(N-y)  =  X(-y),  and 

X(-t)  =  l  X(-y)  exp  [2nit(N-y)/N]  eq.  7 

y*N 

X (- t)  =  s  X ( -y )  exp  [+2nity/N]  exp  [-2nit] 

y»N 

1 

=  l  X{-y)  exp  ( 2H i ty/N)  eq.  8 

y=N 
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Since  t  is  an  Integer,  exp  [-2nlt]  =  exp  [-2nlO]  =  e°  =  1 . 

N-1 

X ( -t)  =  X ( -N )  exp  (2nit  N/N)  +  E  X(-y)  exp  (2ni ty/N) 

y-1 

Since  t  is  an  integer, 

exp  (2nitN/N)  =  exp  (2nit  0/N). 

Because  X  is  periodic,  X ( -N )  =  X  (0).  Then 

X(-N)  exp  (2ni tN/N)  =  X(0)  exp  ( 2ni tO/N ) 

„  .  N-1 

X(-t)  =  e  X ( -y )  exp  (2nity/N) 
y=>0 

The  right  side  of  equation  12  is  now  in  the  form  of  equation  1. 
Since  y  is  a  dummy  index, 

X(-t)  =  FDFT  [X(-t)] 


eq.  9 

eq.  10 

eq.  11 
eq.  12 


eq.  13 


To  show  that  the  FDFT  of  a  real  even  function  is  real  and  even 
(Property  A),  use  the  previous  two  results.  Let  Xg(t)  be  an  even  function, 
such  that 


xe(-t)  -  Xe(t) 

Let  Xe(t)  =  FDFT  [Xe(t)]  Then 


eq.  14 


X e ( t )  =  FDFT  [Xe(t)J  =  FDFT  [Xe(-t)  =  Xe(-t)  =  [Xp(t)] 

t  4 _  o  /;>  _  _ j. 


Xe(t)  is  even 
Xp(t)  is  real 


'e' 

4 


eq.  15 


^Property  A 


which  demonstrates  the  required  result. 
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To  show  that  the  FDFT  of  a  real  odd*  function  -Is  Imaginary  and  odd 
(Property  B),  let  Xq( t)  be  real  and  odd,  such  that 

Xo(-t)  s  -Xfli-t)  eq.  16 

and  let  XQ( t)  be  its  FDFT. 

i  , . . . .73 — im^rv  -T~ - } 

X0(t)  =  FDFT  [X0(t)]  ■  FDFT  [-X0(-t)]-X0(-t)  =  -[X0(t)]#  eq.  17 

which  demonstrates  the  required  property,  property  B. 

The  symmetry  properties  for  Imaginary  functions  (Properties  C  and  D), 
are  proven  analogously  to  those  for  real  functions.  Let  Xe(t)  and  Xo(t) 
be  real  even  and  real  odd  functions  respectively.  Then  1Xe(t)  and  iXo(t) 
are  imaginary  even  and  imaginary  odd  functions  respectively.  The  FDFT  is 
a  linear  operation.  (See  equation  1.)  By  the  previously  demonstrated 
properties, 

FDFT  [1Xe(t)]  =  1  FDFT  [Xe(t)]  =  1Xe(t)  eq.  18 

FDFT  [1X0(t)]«  i  FDFT  [X0(t)]  =  1X0(t)  «1[lifat)]  =  -X'0(t)  eq.  19 

A  A 

where  X#(t)  Is  a  real  and  odd  function,  by  Property  B. 

Since  Xft(t)  is  a  real  even  function  (by  Property  A),  property  C  is  demonstrated. 

A  A 

Since  X#(t)  is  a  real  odd  function  (by  Property  B),  property  D  is  demonstrated. 
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APPENDIX  B  -  Application  of  Symmetry  Properties  to  Two-Channel  Fourier 
Transformation 

Let  X ( t )  and  Y(t)  be  two  real  functions  whose  transforms  are  desired. 
Express  each  function  in  terms  of  its  symmetry  components,  defined  as 
follows: 

Xe(t)  i  1/2  flX(t)  +  X(-t)]  eq.  20 

X0(t)  =  1/2  [X(t)  -  X(-t)]  eq.  21 

Ye(t)  and  Y0(t)  are  similarly  defined.  Note  that  these  functions  are 
even  and  odd  in  the  conventional  sense.  The  process  shown  In  equations 
20  and  21  is  called  symmetrlzatlon.  These  two  equations  may  be  added  to  get 

x(t)  =  Xe(t)  +  X0(t)  eq.  22 

y(t)  =  ye(t)  +  y0(t) 

To  effect  a  two-channel  transformation,  place  the  real  function  X  In  the 
real  part  of  a  new  complex  function  Z,  and  place  the  real  function  y  In 
the  imaginary  part  of  Z. 

Let 

Z(t)  =  X(t)+  ir(t)  eq.  24 

Substitute  equations  22,  23  In  equation  24. 

z(t)  =  xe+x0+  1Ye+  ify  eq.  25 

Since  the  FDFT  Is  a  linear  operator, 

z  (t)  "  V  V  V  <f0  eq.  26 
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Apply  symmetry  properties  A,  B,  C,  D  respectively  to  the  successive  terms 
of  equation  26. 

Real  [Z]  -  Xe-y^  eq.  27 

A  A 

Imag  [Z]  *  ye+X0  eq.  28 

Since  the  even  and  odd  parts  of  a  function  are  separable  by  synmetrlzatlon 

A  A 

(  eq.  20,  21),  the  transforms  X  and  y  can  be  recovered 

X ( t)  *  Even  part  of  ReZ  +  1  odd  part  of  ImZ  eq.  29 

A  A  A  A 

y (t)  »  Even  part  of  ImZ  -  1  odd  part  of  ReZ  eq.  30 

This  separation  may  be  carried  out  explicitly  as  above  to  get  the  separate 
spectra,  or  may  be  done  Implicitly  to  carry  out  various  Fourier  processing 
operations. 
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COMPUTER  PROGRAMMING  FOR  ACCURACY 


J.M,  Yohe 

Mathematics  Research  Center,  U.S.  Army 
The  University  of  Wisconsin 
Madison,  Wisconsin 


ABSTRACT.  In  most  computations,  it  is  tacitly  assumed  that  the  results 
produced  by  the  program  are  "accurate  enough".  This  assumption  is  not  always 
valid.  In  this  report,  we  discuss  several  possible  sources  of  error  in  digital 
computation,  and  we  list  several  steps  which  can  be  taken  to  guard  against  some 
types  of  error  and  determine  an  upper  bound  for  the  effects  of  other  types. 

We  illustrate  these  techniques  with  our  experience  in  writing  a  program  to 
locate  zeros  of  the  Riemann  zeta  function. 

1.  INTRODUCTION.  In  many  cases,  it  is  assumed  that  the  results  of  a 
digital  computation  are  right  if  they  "look  right" — whatever  that  means. 

This  assumption  is  not  always  valid;  indeed,  sometimes  it  cannot  be  tolerated. 

For  example,  if  a  computation  is  done  for  the  purpose  of  proving  a  mathematical 
theorem,  the  proof  is  no  more  reliable  than  the  results  of  the  computation. 

To  put  such  a  proof  on  reasonably  solid  footing,  the  computational  results 
must  be  known  to  be  correct  with  high  probability.  It  would  be  best,  of 
course,  to  prove  the  results  correct  with  mathematical  rigor;  but  the  sources 
of  possible  failure  are  so  numerous  that  such  a  proof  is  virtually  impossible. 

We  will  therefore  address  ourselves  to  the  problem  of  increasing  the  probability 
that  the  results  are  correct. 

Unfortunately,  the  problems  of  detecting,  eliminating,  and/or  controlling 
error  are  complex  and  ill-defined.  Most  of  the  textbooks  on  computer  programming 
we  have  seen  treat  this  subject  lightly  if  at  all;  not  because  it  is  not  important, 
but  because  any  treatment  of  it  is  extremely  difficult  without  using  concrete 
examples . 

In  tuis  report  we  will  identify  several  general  sources  of  error,  and 
list  specific  steps  which  can  be  taken  to  either  guard  against  error,  or 
determine  an  upper  bound  on  its  effects.  We  will  then  illustrate  these 
concepts  with  our  experience  in  writing  a  program  to  locate  roots  of  the 
Riemann  zeta  function.  The  relevant  properties  of  this  function  are  covered 
in  [8]  and  aspects  of  the  mathematical  analysis,  programming,  and  running  of 
the  programs  are  covered  in  [7],  The  project  was  initiated  by  J.  Barkley 
Rosser  and  Lowell  Schoenfeld,  and  che  computation  was  designed  by  Schoenfeld 
and  the  author.  The  computations  were  run  on  the  CDC  3600  at  the  University 
of  Wisconsin  Computing  Center  during  the  period  1964-1968. 


Sponsored  by  the  Mathematics  Research  Center,  United  States  Army,  Madison, 
Wisconsin,  under  Contract  No.  DA-31-124-ARO-D-462 . 


The  remainder  of  this  paper  was  reproduced  photographically  from  the  author's 
manuscript.  , 
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We  will  limit  ourselves  to  matters  of  producing  computed  results  for 
correctly  formulated  problems.  Of  course,  there  is  a  possibility  of  error  in  the 
formulation  and  analysis  of  the  problem,  but  this  is  not  usually  the  concern  of 
the  programmer.  In  any  case,  many  of  these  considerations  have  been  treated 
in  the  literature;  for  example,  in  [4]  there  are  exhaustive  treatments  of  many 
problems,  and  Volume  I  of  [4]  contains  an  extensive  bibliography.  A  general 
discussion  may  be  found  in  [3];  [10]  is  another  valuable  reference. 

No  claim  is  made  that  this  paper  exhausts  all  possible  sources  of  error, 
or  that  the  methods  suggested  are  best  possible.  However,  it  is  hoped  that  it 
will  prove  valuable  as  a  starting  point  for  the  planning  of  computations  whose 
results  must  be  accurate. 

2.  Sources  of  error.  In  this  section,  we  give  a  condensed  list  of  steps  which 
might  be  taken  to  deal  with  various  sources  of  error;  in  the  remaining  sections, 
we  expand  on  these  ideas  and  illustrate  some  of  these  steps  with  concrete 
exar  nles. 

A.  Errors  due  to  hardware  limitations : 

1.  Determine  the  limitations  of  the  hardware  and  the  action 

of  each  instruction  on  the  data. 

2.  Determine  the  type  of  error  analysis  to  be  used. 

3.  Plan  the  computation  so  as  to  minimize  the  effect  of 
hardware  limitations. 

4.  Make  certain  that  any  changes  in  hardware  will  be  ade¬ 
quately  publicized;  check  all  such  changes  for  possible 
effect  on  program. 

B.  Errors  due  to  software  limitations: 

1.  Determine  the  limitations  of  the  software,  including  the 
accuracy  of  all  subroutines  to  be  used  ( including  input/ 
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output  conversions) ;  perform  error  analysis  on  all  mathe 


matical  and  input/ output  subroutines,  both  those  used  by 
the  program  and  those  used  by  the  compiler/assembler. 

2.  Decide  whether  library  subroutines  should  be  used  or 
whether  special  purpose  routines  should  be  written. 

3.  Make  certain  that  any  changes  in  software  will  be  ade¬ 
quately  publicized;  check  all  such  changes  for  possible 
effect  on  program. 

C.  Errors  due  to  hardware  failure: 

1.  Make  sure  the  hardware  operates  as  per  specifications. 

2.  Determine  what  checking  features  are  built  into  the 
hardware. 

3.  Determine  what  checks  can  be  built  into  the  program  to 
help  detect  hardware  failure. 

4.  Run  the  program  on  two  different  computers  or_run  a 
second  program  to  check  thoroughly  the  results  of  the 
first  program  or  run  the  same  program  twice  and  compare 
the  results  or  design  the  program  to  effectively  do  the 
computation  twice,  using  different  methods. 

5.  Design  the  program  to  avoid  or  minimize  use  of  hardware 
features  that  are  either  untested  or  are  known  to  be 
troublesome. 

6.  If  tapes  are  used,  use  them  as  little  as  possible  and 
record  in  the  lowest  density  practicable. 
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7.  Design  program  to  run  efficiently  in  relatively  short 
segments  (  say  1/2  hour  or  less)  rather  than  requiring 
long  unbroken  runs. 

D.  Errors  due  to  software  failure: 

1.  Check  that  all  software  to  be  used  performs  correctly  and 
to  specifications. 

2.  Determine  what  checks  are  built  into  the  software. 

3.  Check  that  software  documentation  is  complete,  concise, 
and  accurate.  Test  all  features  of  software  for  which  docu¬ 
mentation  is  cloudy,  and  document  the  results  of  the  test. 

4.  If  diagnostic  capabilities  of  the  compiler  are  not  exhaus¬ 
tive,  compile  the  program  using  another  compiler  for 
checking  purposes. 

5.  Check  the  machine  language  program  produced  by  the 
compiler. 

6.  Determine  what  checks  can  be  built  into  the  program  to 
help  detect  software  failure. 

E.  Errors  due  to  program  failure: 

1.  Flow  chart  the  program. 

2.  Check  the  flow  chart  (  both  programmer  and  problem 
originator  should  do  this) . 

3.  Prepare  a  glossary  of  variable  names  used  in  the  program. 

4.  Keep  the  flow  charts  and  glossary  up  to  date  as  program¬ 
ming  progresses. 
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5.  Desk-check  program  (a  second  programmer  should  do  this) . 

6.  Check  the  program  against  another  program,  a  hand  com¬ 
puted  case,  or  against  published  tables.  Explain  com¬ 
pletely  all  discrepancies. 

7.  Check  that  all  branches  of  the  program  work  properly  and 
interact  properly. 

8.  Provide  for  expansion  of  the  program  and/or  its  task  with¬ 
out  undue  reprogramming,  (for  example,  use  variables  for 
limits  of  loops  and  make  I/O  unit  designations  general). 

F.  Errors  due  to  faulty  operation: 

1.  Design  the  program  for  ease  of  operation  and  minimal 
action  by  the  computer  operator. 

2.  Determine  what  checks  the  program  can  make  on  input  data. 

3.  Provide  for  print-out  of  all  input  data  for  visual  checking 
whenever  possible. 

4.  Determine  scrupulously  the  effects  of  program  interrupts 
by  the  operator,  or  of  other  steps  that  the  operator  may 
initiate  during  the  running  of  the  program. 

G.  Errors  due  to  inadequate  planning: 

1.  Determine  in  detail  the  present  and  future  objectives  of 
the  program. 

2.  Determine  what  input  and  output  is  required  for  all 

purposes,  and  the  formats  to  be  used  for  it. 

3.  Determine  the  names  to  be  used  for  variables  in  the 

program. 
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4.  Determine  what  checks  will  need  to  be  made,  how  they 
will  be  made,  and  decide  how  the  computation  can  best 
be  designed  to  facilitate  checking. 

5.  Make  adequate  provisions  for  restarts. 

6.  Take  into  consideration  all  points  listed  in  the  previous 

six  categories. 

3.  Errors  due  to  hardware  limitations.  The  most  prevalent  error  of  this  type  is  the 
so  called  "round-off  error".  It  cannot  be  eliminated,  but  its  effect  can  be  determined 
since  it  is  systematic  and  completely  predictible,  given  that  the  action  of  the 
hardware  on  data  is  known  for  the  various  operations.  ( The  computation  can 
usually  be  designed  so  as  to  reduce  the  effect  of  round-off  error,  but  we  will 
not  attempt  to  discuss  this  here).  There  is  a  large  body  of  work  dealing  with 
the  analysis  and  control  of  round-off  error.  We  will  mention  two  such  methods. 

One  popular  method  of  error  estimation  is  interval  arithmetic,  which  was 
developed  by  R.  E.  Moore;  it  is  detailed  in  [  2],  A  general  discussion  of 
interval  arithmetic,  some  of  its  applications,  and  techniques  for  programming 
it  can  be  found  in  [5],  and  an  implementation  of  interval  arithmetic  for  the 
CDC  1604  and  the  CDC  3600  can  be  found  in  [6], 

Interval  arithmetic  has  the  advantage  of  being  relatively  easy  to  use. 

Its  principal  disadvantages  are  that  it  is  perhaps  somewhat  slower  than  other 
methods,  and  the  facility  for  performing  interval  arithmetic  is  not,  to  our  knowledge, 
incorporated  in  the  hardware  of  any  popular  computer.  Therefore,  interval  arithemetic 
must  be  performed  by  a  program.  Such  programs  are  not  generally  available  for 
most  computers  and,  even  if  they  are  available,  they  are  subject  to  many  of  the  same 
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sources  of  error  as  any  other  program. 


Another  method  of  determining  bounds  on  this  type  of  error  is  the  a  priori 
error  analysis  introduced  for  floating  point  arithmetic  in  I960  by  Wilkinson  [9]. 

In  Chapter  IX  of  [  11] ,  Lowell  Schoenfeld  discusses  this  method,  compares 
a  priori  analysis  with  interval  arithmetic,  and  gives  specific  values  of  Wilkinson's 
parameters  for  error  analysis  of  CDC  3600  computations. 

In  the  zeta  function  programs,  we  elected  to  use  the  a  priori  analysis. 

The  primary  reason  for  this  choice  was  speed  of  computation.  The  analysis 
was  checked  carefully,  and  then  the  error  bounds  obtained  were  multiplied  by 
a  factor  of  2. 1  for  added  safety. 

An  a  priori  error  analysis  is  properly  part  of  the  planning  of  a  computa¬ 
tion.  However,  for  this  type  of  analysis,  the  action  of  each  machine  instruc¬ 
tion  must  be  completely  known,  and  the  precise  order  in  which  the  instructions 
are  executed  must  also  be  known.  Consequently,  the  planning  and  the  pro¬ 
gramming  must  be  interactive  in  this  case.  The  program  affects  the  error 
analysis,  and  the  error  analysis  in  turn  may  reveal  places  in  the  program  where 
modification  would  yield  a  more  accurate  computation. 

Such  error  analysis  involves  a  great  deal  of  tedious  calculation.  Although 
this  may  seem  to  be  an  unnecessary  amount  of  work,  it  has  values  other  than 
just  bounding  the  error  in  the  final  computation.  For  example,  these  error  bounds 
alerted  us  to  an  error  in  the  compiler  which  would  probably  never  have  been 
noticed  otherwise;  see  [7], 

We  should  not  leave  this  subject  without  mentioning  one  other  point  . 

On  some  computers,  it  is  possible  to  alter  the  rounding  algorithm;  programs 
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may  then  be  run  twice,  using  a  different  rounding  algorithm  each  time.  While 
this  method  may  well  be  of  value  in  deciding  whether  the  answers  "look  right", 
it  does  not  in  general  yield  rigorous  bounds  on  this  type  of  error. 

4.  Errors  due  to  software  limitations.  As  the  hardware  has  certain  limitations 
in  its  performance,  so  does  the  software.  In  this  section,  we  assume  that  the 
software  is  functioning  as  designed,  and  that  therefore  errors  of  this  type  will 
be  systematic  and  completely  predictable  once  the  action  of  the  software  is 
known. 

Software  limitations  include  errors  in  mathematical  functions,  errors  in 
binary-to-decimal  and  decimal -to -binary  conversion  (  both  in  compilers  and 
assemblers  and  in  the  library  subroutines  used  by  the  program  at  object  time)  , 
etc. 

If  listings  are  available  and  not  too  many  routines  are  to  be  used,  one 
may  elect  to  go  ahead  with  an  a  priori  analysis.  This  is  tedious  and  time 
consuming  work,  and  one  always  runs  the  risk  that  the  routines  will  be  changed 
after  the  analysis  is  completed.  In  the  zeta  function  programs  we  found  during 
debugging  that  the  results  of  two  consecutive  runs  differed  in  places  where 
presumably  no  changes  had  been  made.  Investigation  showed  that  several  decimal 
constants  had  been  converted  to  binary  differently  in  the  two  runs.  The  reason 
was,  we  found,  that  the  conversion  routine  used  by  the  compiler  had  been  changed 
between  the  two  runs,  and  the  two  routines  gave  slightly  different  results.  We 
had  not  been  informed  of  this  change. 

An  a  priori  analysis  of  input-output  routines  may  be  extremely  difficult 
due  to  the  complexity  of  the  routines.  For  the  mathematical  routines,  the 
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a  priori  analysis  Involves  analysis  of  the  error  due  to  mathematical  approxima¬ 
tion  as  well  as  error  due  to  hardware  limitations.  The  job  is,  indc  d,  a 
monumental  one. 

Another  alternative  is  to  write  home-made  routines  for  the  jobs  which 
need  to  be  done.  For  example,  in  the  zeta  function  program,  we  wrote  a  special 
high-precision  conversion  routine  to  convert  critical  constants  to  binary.  The 
binary  numbers  were  then  given  to  the  compiler,  which  merely  copied  them; 
hence  no  error  was  introduced  by  the  compiler.  Similarly,  we  wrote  our  own 
programs  for  logarithm,  square  root,  and  cosine. 

The  reader  may  wonder  why  all  of  this  is  necessary.  After  all,  the 
manufacturer  does  provide  error  bounds,  at  least  on  the  mathematical  routines; 
why  not  use  those?  The  answer  is  that  the  bounds  customarily  provided  by 
the  manufacturer  are  either  bounds  on  the  mathematical  approximation  or  bounds 
determined  empirically;  we  know  of  no  case  where  the  bounds  given  are  asserted 
to  be  mathematically  rigorous. 

5.  Errors  due  to  hardware  failure.  Up  to  this  point,  we  have  considered  only 
predictable  and  systematic  errors.  We  now  embark  upon  a  consideration  of 
errors  which  are  less  frequent,  totally  unpredictable,  and  almost  totally  outside 
the  control  of  the  programmer.  Errors  of  the  type  to  be  considered  in  this  section 
include  main  frame  failure,  failure  of  disks,  drums,  card  readers,  tape  units,  etc. 
The  list  is  endless,  or  seemingly  so.  We  will  also  include  here  errors  due  to 
failure  of  the  magnetic  tapes  themselves. 

Such  errors  are  difficult  to  find  and  difficult  to  guard  against.  All 
reasonable  checks  which  can  be  built  into  a  program  should  be  ( such  checks 
are  valuable  for  catching  other  types  of  errors  as  well). 
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For  example,  it  is  reasonable  to  test  that  the  sign  of  a  computed  result  is 
correct,  or  that  the  magnitude  of  the  computed  result  is  within  certain  bounds, 
if  such  information  is  known.  The  nature  of  such  checks  for  a  particular  prob¬ 
lem  will  depend  on  the  analysis  of  the  problem  in  question,  and  no  general  rules 
can  be  given. 

One  of  the  surest  methods  of  catching  errors  due  to  hardware  failure  is 
to  run  the  entire  computation  twice  —  preferably  using  different  programs  and 
different  machines.  This,  however,  may  not  be  feasible.  If  the  computation 
can  be  run  twice  on  the  same  machine  using  totally  different  programs,  this 
would  be  nearly  as  good.  Either  of  these  two  alternatives  is  also  an  effective 
check  for  other  types  of  errors.  A  third  choice  is  to  run  the  computation  twice 
on  the  same  machine  using  slightly  different  programs.  The  answers  obtained  by 
the  two  runs  should  be  compared  and  all  discrepancies  explained. 

For  the  zeta  function  program,  we  effectively  ran  the  computation  twice, 
using  slightly  different  programs.  The  first  program  computed  certain  values  of 
the  independent  variable  tau,  ( whose  accuracy  was  not  critical)  and  a  certain 
function  f  of  tau.  The  values  of  tau  were  chosen  so  that  consecutive  values 
of  f(  tau)  had  opposite  signs  and  were  larger  in  magnitude  than  the  a  priori 
error  bounds.  The  first  program  then  wrote  the  tau  values  and  the  sign  of 
i(tau)  on  magnetic  tape.  The  second  program  read  the  tape,  recomputed  f(  tau), 
and  checked  to  make  sure  that  the  sign  of  f(  tau)  read  from  tape  agreed  with 
the  sign  just  computed.  The  program  also  made  several  other  checks. 

The  hardware  was  not  entirely  error  free,  although  it  did  prove  to  be 
remarkably  reliable  (  see  [7]).  On  one  occasion,  we  had  trouble  due  to  a  bit 
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dropping  in  one  of  the  CPU  registers.  Fortunately,  this  manifested  itself  by 
causing  either  an  attempted  write  on  a  nonexistent  tape  or  a  format  length 
error  on  printer  output.  In  either  case,  the  operating  system  merely  aborted 
the  program. 

There  were  two  other  means  of  checking  the  actual  computation  on  a 
spot-check  basis.  The  first  of  these  involved  the  use  of  a  totally  different 
program  to  perform  the  same  computations.  This  program  was  applied  to  selected 
points  from  each  run  of  the  computation,  and  the  results  compared  against  the 
results  obtained  by  the  main  program.  The  second  method  of  checking  involved 
the  use  of  ten  variations  of  the  method  of  computing  f(  tau)  (see  [7]). 
Periodically,  all  ten  methods  were  applied  to  one  value  of  the  independent  vari¬ 
able  and  the  results  were  compared.  The  results  of  each  pair  of  methods  were 
required  to  agree  to  within  the  sum  of  the  a  priori  error  bounds  for  the  two  methods. 

The  tapes  and  tape  units  gave  us  trouble  practically  from  the 
beginning  of  the  computation.  The  first  program  was  designed  so  that,  on  a 
restart,  it  would  first  copy  the  tape  from  the  previous  run  onto  a  fresh  tape,  and 
then  add  the  results  of  the  current  computations  to  the  end  of  the  new  tape.  In 
this  manner,  the  previous  tape  could  be  file  protected,  and  the  risk  of  damage 
or  destruction  was  minimized.  We  rotated  among  three  and  sometimes  four 
tapes,  and  this  gave  us  a  further  measure  of  protection  in  the  event  that  a  run 
was  aborted.  This  method  seemed  to  work  as  well  as  could  be  expected  with 
tapes;  and  since  the  hardware  is  designed  for  read-after-write  and  the  software 
we  used  implements  this  check,  we  had  comparatively  few  parity  errors  on  the 
tapes  we  wrote.  Those  tapes  which  did  seem  to  have  bad  spots  on  them  when 
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reading  was  attempted  could  almost  invariably  be  read  by  mounting  them  on 
the  physical  unit  on  which  they  were  originally  written.  We  also  used  nearly 
new  tapes  for  the  output. 

As  the  second  program  progressed,  tape  failures  became  more  frequent. 
We  found  that  by  switching  from  the  previous  recording  density  of  800  bpi  to 
556  bpi  the  troubles  were  alleviated  to  some  extent;  the  remainder  of  the  com¬ 
putation  was  recorded  on  tape  at  556  bpi.  We  would  have  gone  to  200  bpi, 
except  that  the  number  of  reels  of  tape  required  would  have  been  prohibitive. 

Whenever  two  reels  of  tape  were  supposed  to  be  identical,  they  were 
compared  to  make  sure  that  they  were  identical.  If  they  were  not,  any  discrep¬ 
ancies  were  checked  to  determine  which  tape,  if  either,  was  correct.  We 
always  saw  to  it  that  we  had  duplicate  copies  of  all  important  reels  of  tape; 
this  saved  having  to  rerun  portions  of  the  computation  to  recreate  tape  that 
could  not  be  read. 

Whenever  possible,  the  program  should  be  designed  to  run  in  relatively 
short  segments  rather  than  requiring  long,  unbroken  runs.  This  will  reduce  the 
probability  of  hardware  failure  during  any  one  run,  and  will  also  reduce  the 
computer  time  lost  if  such  an  enror  does  occur.  The  zeta  function  program  was 
not  designed  this  way,  and  there  was  an  inordinate  amount  of  computer  time 
lost  due  to  hardware  (  and  software)  failures. 

6.  Errors  due  to  software  failure.  Software  failure  is  not  as  infrequent  as 
might  be  hoped,  especially  if  the  computer  or  the  software  is  new.  Like  any 
other  program,  software  must  be  debugged;  yet  most  of  us  tend  to  assume  that, 
by  the  time  software  is  released,  it  is  relatively  bug-free.  And,  in  fact,  the 
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obvious  bugs  are  generally  out.  That  leaves  the  subtle  ones  to  plague  us. 
Dealing  with  these  bugs  requires  all  of  the  techniques  mentioned  thus  far,  to¬ 
gether  with  some  to  be  mentioned  later. 

Certain  experiences  with  the  zeta  programs  might  be  instructive.  Other 
such  experiences  are  noted  in  [7]. 

Various  errors  and  inadequacies  were  found  in  the  compiler.  For  example, 
the  compiler  converted  a  double  precision  constant  in  an  IF  statement  to 
zero  instead  of  the  correct  value.  This  manifested  itself  by  branching  the 
wrong  way  in  the  IF  statement;  print-outs  from  the  program  showed  that 
this  had  happened.  The  error  was  identified  by  examining  the  machine  language 
listing  of  the  compiled  program. 

In  the  Drum  Scope  operating  system  we  encountered  one  of  the  most 
serious  bugs  found  in  software  during  the  entire  project  Shortly  after  the 
Drum  Scope  operating  system  replaced  the  Tape  Scope  system,  one  of  the  checks 
built  into  the  second  program  began  failing.  This  would  only  occur  once  in  a 
run,  since  the  program  was  designed  to  stop  immediately  in  this  event.  But  on 
rerunning,  the  check  would  not  fail.  Ihis  was  explained  only  after  it  was 
discovered  that  the  monitor  system  failed  to  restore  an  internal  register  after 
the  operator  interrupted  to  type  in  a  message.  (This  register  was  ordinarily  used 
very  little,  but  our  program  made  heavy  and  consistent  use  of  it  throughout  the 
computation.)  It  was  then  decided  to  rerun  the  entire  computation  and  compare 
the  resuits  of  the  two  runs;  as  it  turned  out,  only  one  erroneous  value  had  gone 
unnoticed  (  although  another  had  been  found  by  visual  inspection  of  the  output)* 

Another  type  of  inadequacy  which  can  occur  in  software  is  the  "sin  of 
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omission".  For  example,  a  FORTRAN  compiler  may  fail  to  inform  the  programmer 
of  some  irregularity — such  as  a  variable  which  is  defined  but  never  used.  This 
will  undoubtedly  affect  the  results  of  the  computation  if,  for  example,  the  vari¬ 
able  name  was  misspelled  when  it  was  intended  to  be  used  ( or  when  defined) . 
The  compiler  might  then  merely  reserve  a  cell  for  the  one  spelling  of  the  variable 
name,  setting  its  contents,  say,  to  zero;  the  programmer,  not  having  been  in¬ 
formed,  may  not  realize  that  this  has  happened.  For  reasons  of  this  sort,  it  is 
well  to  have  the  program  compiled  using  the  best  diagnostic  compiler  available, 
even  if  this  means  that  the  program  cannot  make  use  of  the  more  powerful  lan¬ 
guage  available  on  the  object  computer.  When  the  program  compiles  success¬ 
fully  under  the  diagnostic  compiler,  it  can  then  be  recompiled  for  use  on  the 
computer  best  suited  to  do  the  computation.  The  use  of  a  good  diagnostic  com¬ 
piler  will  usually  also  result  in  a  reduction  of  both  man-hours  and  elapsed  time 
required  to  get  a  program  running. 

Operating  systems  can  have  similar  faults.  For  example,  in  the  zeta 
function  computation,  a  change  in  logical  unit  numbers  for  the  purpose  of  be¬ 
ginning  a  new  reel  of  tape  was  handled  incorrectly  by  the  program.  This,  of 
course,  was  not  the  fault  of  the  operating  system.  However,  one  result  was  that 
the  operating  system  was  asked  to  write  on  a  tape  which  had  not  been  declared 
by  the  control  cards.  The  monitor  merely  assigned  a  scratch  tape,  with  no 
indication  that  this  had  been  done;  wrote  on  it,  then  released  it  at  the  end  of 
the  run — with  the  result  that  the  information  was  lost.  Had  the  monitor  informed 
us  of  this  assignment,  no  harm  would  have  been  done  due  to  the  rotation  scheme 
of  tapes;  as  it  was,  this  programming  blunder  cost  us  nearly  five  hours  of  com¬ 
puter  time. 
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7.  Errors  due  to  program  failure.  In  this  category  we  place  all  errors  in  pre¬ 
paring  the  problem  for  computation.  These  include  errors  in  logic,  flow  charting, 
coding,  transcription,  keypunching,  etc.  —  the  sorts  of  errors  usually  referred 
to  as  "bugs".  Chapter  13  of  [1]  gives  a  general  discussion  of  the  problem  of 
program  debugging  or  checkout  and  lists  some  specific  methods  which  can  be 
used  to  determine  the  source  of  error  once  it  has  been  determined  that  an  error 
exists.  And,  of  course,  most  programmers  have  developed  their  own  methods 
of  debugging  through  experience.  The  problem  is  in  determining  whether  an 
error  exists  in  the  program. 

A  program  should  be  flow  charted  before  coding  begins.  All  too  often 

this  step  is  skipped  because  "the  problem  is  so  simple  that  a  flow  chart  is  not 

necessary".  But,  particularly  if  the  programming  is  being  done  for  someone 

else,  a  flow  chart  is  of  great  value.  For  example,  the  person  originating  the 
problem  can  usually  check  the  flow  chart  ( for  this  reason,  it  is  a  good  idea  to 
adopt  some  standard  for  flow  charting  symbols  within  the  organization;  see 
appendix  for  a  suggested  standard) ;  if  this  is  done,  he  may  uncover  errors  due 
to  lack  of  communication  between  himself  and  the  programmer  as  well  as  errors 
in  logic  or  design  of  the  computation.  The  programmer  should  also  prepare  a 
glossary  of  variable  names  used. 

Once  programming  has  begun,  it  is  rare  indeed  that  the  flow  chart  is 
followed  exactly.  There  are  usually  places  in  the  program  where  more  efficient 

or  more  effective  ways  of  doing  the  job  occur  to  the  programmer  as  he  writes  the 

program.  This  is  to  be  expected;  however,  the  flow  chart  should  be  updated  and 

rechecked  as  the  programming  progresses. 

When  the  program  is  complete,  it  should  be  desk-checked  by  another 
programmer  { it  is  taken  for  granted  that  the  programmer  checks  his  own  work) 
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before  it  is  tested  on  the  computer.  However,  it  is  usually  a  good  idea  for 

both  the  programmer  and  the  checker  to  work  from  a  listing  of  the  card  deck 
for  the  final  pre -computer  check;  this  also  tends  to  reduce  the  number  of  key¬ 
punching  errors  which  actually  get  to  the  computer.  The  initial  number  of  key¬ 
punch  errors  will  be  reduced  if  the  programmer  writes  the  program  neatly  on  the 
coding  sheets  and  uses  standard  graphic  symbols  for  characters  which  may  be 
confused  —  for  example,  O  and  0  ,  I  and  1  . 

In  particularly  critical  cases,  the  program  should  be  written  in  two 
different  languages  or  for  two  different  computers  and  the  results  of  test  runs 
compared  before  debugging  Is  considered  complete.  In  the  zeta  function  pro¬ 
grams,  critical  subroutines  were  written  for  both  the  CDC  1604  and  the  CDC 
3600,  checked  out  on  both  machines,  and  then  results  of  test  runs  were  com¬ 
pared.  In  addition  to  the  expected  number  of  program  bugs  uncovered  by  this 
procedure,  we  found  that  the  double  precision  arithmetic  program  used  by  1604 

FORTRAN  was  in  error  ( it  returned  a  value  of  zero  for  a-b  if  |  a-b|  was  less 
than  a  X  2  ) ,  and  that  the  floating  divide  instruction  on  the  1604  did  not 

round  correctly.  Although  both  of  these  discoveries  involved  the  machine  we 
were  not  going  to  use  for  the  final  computation,  this  can  only  be  regarded  as 
fortuitous. 

Of  course,  the  results  of  the  computation  should  be  compared  against 
a  known  check  case,  and  this  check  case  should  encompass  as  many  of  the 
features  of  the  program  as  possible.  This  case  might  be  a  hand-computed 
special  case,  a  case  computed  by  an  existing  program  which  is  known  to  be 
correct,  or  values  given  in  published  tables.  All  discrepancies  should  be 
completely  explained.  ( In  one  case  we  found  an  erroneous  value  in  one  of  the 
tables  we  were  using  for  checking  purposes) . 

All  possible  branches  of  the  program  should  be  checked  out,  by  fake 
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data  if  necessary.  This  not  only  assures  that  each  branch  works  as  it  should, 
but  also  that  the  different  branches  interact  properly. 

Finally,  when  possible,  the  program  should  be  checked  by  comparing  the 
results  obtained  by  another  program  using  a  totally  different  method  ( and,  if 
possible,  written  by  another  programmer  without  collaboration. ) 

A  word  should  be  said  about  the  choice  of  programming  languages.  In 
some  cases,  machine  language  offers  such  benefits  that  it  virtually  has  to  be 
used.  (  see,  for  example,  [12]).  In  such  cases,  it  should  be  used;  however, 
errors  are  much  easier  to  make  in  programming  in  machine  language,  and  they 
are,  as  a  rule,  much  harder  to  find.  For  the  most  part,  one  of  the  problem 
oriented  languages  should  be  used;  preferably  the  one  with  the  best  diagnostic 
capabilities  available.  But  here,  too,  caution  is  necessary;  as  we  have  noted, 
these  compilers  themselves  may  contain  errors  which  might  result  in  an  erroneous 
program  being  compiled. 

In  the  zeta  function  program,  the  necessity  of  obtaining  a  degree  of 
efficiency  impossible  with  FORTRAN,  together  with  the  necessity  of  complete 
control  over  operations  used  and  operation  sequences  ( due  to  the  a  priori  error 
analysis)  dictated  the  use  of  machine  language  for  much  of  the  programming. 
These  portions  of  the  program  were  also  written  in  FORTRAN,  and  the  two  ver¬ 
sions  were  compared.  We  used  FORTRAN  for  the  remainder  of  the  program,  but 
the  listings  of  the  machine  language  created  by  the  compiler  were  carefully 
checked  to  make  sure  that  the  compiler  indeed  created  a  program  which  would 
do  exactly  what  was  intended. 


8.  Errors  due  to  faulty  operation.  These  errors  include  computer  operator 
errors  and  errors  in  setting  up  the  program  for  a  run,  preparing  data,  etc. 

The  best  way  of  guarding  against  operator  errors  is  to  make  the  program 
as  easy  as  possible  to  operate,  and  minimize  the  necessity  for  operator  inter¬ 
vention.  The  program  should  check  any  tapes  provided  as  input  to  make  sure 
that  the  correct  tape  is  being  provided.  In  addition,  all  input  tapes  should  be 
file  protected.  This  will  mean  that  the  program  cannot  write  on  the  input  tape;  if  it 
is  necessary  to  have  a  tape  containing  both  the  input  and  the  output  from  the 
run,  the  input  tape  should  be  copied  onto  the  output  tape. 

As  far  as  errors  in  data  preparation  and  orogram  set-up  are  concerned, 
the  program  should  be  designed  for  ease  of  set-up  and  data  preparation.  For 
example,  input  might  be  free-format,  and  each  quantity  might  be  identified  on 
the  data  card  by  its  name.  The  program  could  then  read  the  name,  locate  the 
parameter  in  the  program,  then  read  the  value  and  store  it.  The  program  should 
also  make  thorough  checks  on  the  consistency  of  the  input  data,  and  print  out 
the  input  data  for  later  visual  inspection. 

9.  Errors  due  to  inadequate  planning.  Errors  of  this  sort  should  be  caught  in 
the  programming  or,  if  they  manifest  themselves  during  the  running  of  a  program, 
they  should  be  caught  by  the  program.  For  example,  if  too  little  storage  space 
is  allotted  for  an  array,  the  program  should  catch  this  in  its  check  to  make  sure 
that  the  array  length  is  not  exceeded.  (Note  that  such  a  check  is  not  compiled  auto¬ 
matically  bymostFORTRAN  compilers.  )  Again,  the  tape-switching  error  described  in 

Section  6  was  definitely  a  planning  error,  and  should  never  have  reached  the  pointof 
affecting  the  computation  --but  it  did. 

Several  things  can  be  done  to  help  here.  First,  the  planning  of  a 


program  should  include  provisions  for  the  program  to  be  expanded  ( or,  more 
properly,  for  its  task  to  be  expanded)  beyond  the  wildest  dreams  of  the  pro¬ 
grammer  and  the  project  originator.  For  example,  if  the  program  is  to  use  tape 
as  an  output  medium,  take  it  for  granted  that  it  will  use  an  arbitrarily  large 
number  of  reels,  even  though  "it  can't  possibly  fill  up  more  than  half  a  reel". 

Then  plan  accordingly. 

One  pofnt  which  seems  minor,  but  can  make  life  easier  for  all  concerned, 
is  this:  All  variable  names  used  in  the  program  should  agree  as  nearly  as  possible 
with  the  names  used  for  them  in  the  planning  stages.  This  may  mean  that  some 
variable  names  will  not  coincide  with  their  usual  type  ( integer  or  floating  point) 
if  FORTRAN  is  used,  and  extra  declarations  will  be  necessary;  however,  that 
price  is  small  compared  to  the  cost  of  confusion  arising  when  there  are  a  large 
number  of  variables,  each  of  which  has  two  names.  In  the  zeta  function  pro¬ 
grams,  matters  were  arranged  so  that  nearly  all  variables  could  be  changed  by 
supplying  the  program  with  a  card  giving  the  name  of  the  variable  and  the  value 
desired.  The  idea  was  good,  and  it  worked  well  in  practice,  except  that  most 
variables  had  both  "internal"  and  "external"  names,  and  confusion  was  all  too 
common.  In  fact,  we  later  needed  the  capability  of  changing  variables  which 
had  not  originally  been  designed  for  changing  and  we  accomplished  this  by 
specifying  the  location  in  the  parameter  array  to  be  changed.  This  gave  us  yet 
a  third  name  for  most  variables,  and  the  confusion  was  somewhat  disconcerting. 

In  the  planning,  adequate  provision  should  be  made  for  restarting  the 
computation  and  for  recovery  of  information  lost  due  to  hardware  or  software 
failure.  The  design  of  the  zeta  function  program  left  something  to  be  desired 
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in  this  regard.  In  the  interest  of  minimizing  the  number  of  tapes  used,  we 
wrote  the  restart  information  on  the  output  tape.  The  tape  format  should  have 
been  designed  so  that  restart  information  could  be  interspersed  with  "good" 
output,  and  restart  information  should  have  been  provided  frequently.  Unfor¬ 
tunately,  this  was  not  done;  the  result  was  that  the  only  restart  points  were  at 
the  ends  of  successful  runs.  Consequently,  a  failure  near  the  end  of  the  run 
either  caused  the  loss  of  the  entire  run,  or  necessitated  tedious  ( and  error- 
prone)  preparation  of  restart  information  by  hand. 

Finally,  the  planning  of  the  computation  should  include  provisions  for 
all  of  the  contingencies  discussed  in  the  previous  sections. 

10.  Conclusion.  After  all  of  these  precautions  have  been  taken  and  after  all 
of  this  tedious  and  laborious  detail  has  been  attended  to,  can  we  be  certain 
that  the  results  of  our  computation  will  be  accurate?  No.  At  least,  not  100% 
sure;  there  is  always  room  for  human  error  in  human  endeavors.  But  we  have 
acknowledged  the  many  possible  sources  of  error,  and  we  have  approached  the 
computation  in  a  systematic  and  careful  manner,  with  the  specific  intention  of 
eliminating,  minimizing,  or  controlling  and  completely  determining  the  effects 
of  these  errors.  We  have  not  merely  ignored  the  problem  in  the  hope  it  will  not 
occur;  we  have  attempted  to  solve  it.  In  that  sense,  we  can  place  a  good  measure 
of  confidence  in  the  results,  and  the  element  cf  doubt  that  must  certainly  remain 
is  intelligent  doubt.  In  any  case,  we  are,  certain  that  we  have  done  the  best 
we  can. 
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COMPUTATION  OF  THE  DYNAMIC  RESPONSE  OF  A  MEMBRANE 


Alvars  CelmiijS 

Applied  Mathematics  Division 
U.S.  Army  Ballistic  Research  Laboratories 
Aberdeen  Proving  Ground,  Maryland 


ABSTRACT 

Typical  deformations  of  membranes  in  dynamic  response  problems  are 
large  and,  therefore,  a  linearization  of  the  problem  by  assuming  "small 
deflections"  is  not  possible.  For  the  same  reasons  the  stress-strain 
laws  cannot  be  assumed  linear.  The  problem  to  be  solved  is  then  a 
non-linear  totally  hyperbolic  second  order  system  of  partial  differential 
equations.  In  a  specialized  case  considered  here  the  system  consists  of 
two  equations  for  two  unknown  functions. 

For  the  solution  of  equation  systems  of  the  type  mentioned  (i.e., 
u.  ,  =  f(x,t,u,u  ,u  ,u  ))  a  two-level  difference  scheme  with  a  predictor- 

corrector  algorithm  will  be  presented.  The  quadrature  error  of  the 
solution  method  is  of  fourth  order.  The  solution  method  satisfies 
necessary  (von  Neumann)  stability  conditions  for  local  stability. 

Some  applications  of  the  solution  method  to  membranes  with  hysteresis 
type  stress -strain  laws  will  be  presented.  The  computations  exhibit  the 
nature  of  the  propagation  of  stresses  along  the  membrane.  They  can  be  used 
to  analyze  numerically  the  dynamic  response  of  membranes,  to  predict  places 
of  possible  failures  and  to  predict  permanent  deformations.  Together  with 
adequate  experiments  such  computations  may  be  used  to  test  the  existing 
theories  about  the  behavior  of  membranes  under  large  stresses. 


1.  PROBLEM  OUTLINE 

Typical  objectives  of  investigations  concerned  with  the  dynamic 
response  of  thin  metal  shells  to  blast  loads  are  the  predictions  of  large 
and  permanent  deformations.  This  means  that  the  corresponding  equations 
of  motion  cannot  be  linearized  assuming  "small  deflections"  nor  can  the 
stress-strain  law  be  assumed  linear.  In  order  to  keep  the  computation  of 
such  deformations  in  spite  of  these  conditions  as  simple  as  possible  a 
membrane  theory  may  be  applied  by  assuming  that  the  bending  strength  of 
the  shell  is  negligible.  The  present  paper  deals  with  calculations 
within  the  scope  of  such  a  membrane  theory.  It  is  an  open  question  to 
what  extent  this  model  can  be  used  to  describe  the  behavior  of  a  real 
shell.  This  problem  will  not  be  discussed  here.  It  could  be  clarified 
by  comparison  of  the  computed  deformations  with  corresponding  experimental 
values.  Independently  of  such  experiment,  the  calculations  presented  may 
be  of  interest  as  an  example  for  the  numerical  treatment  of  a  system  of 
non-linear  partial  differential  equations.  The  algorithm  developed  does 
not  utilize  the  special  form  of  the  equations  and  can  be  applied  to  any 
hyperbolic  second  order  system  with  two  independent  variables. 
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Examples  for  the  application  of  the  algorithm  are  presented  in 
Section  4.  They  show  the  necessity  to  keep  in  close  touch  with  the 
mechanical  interpretation  of  the  mathematical  expressions  used.  In  the 
case  considered  here  the  discretized  equations  of  motion  of  the  membrane 
can  be  interpreted  as  equations  of  a  mechanical  model  of  the  membrane. 

This  interpretation  gives  a  better  insight  into  the  behavior  of  the 
solutions  and  permits  to  select  an  adequate  discretization  process. 

About  the  membrane  we  make  the  following  assumptions  throughout 
the  paper: 

(a)  the  membrane  is  perfectly  flexible 

(b)  the  membrane  is  infinitely  long  in  one  direction,  say  z; 

( c )  the  loads  are  independent  of  z . 

We  note  that  with  the  assumptions  (b)  and  (c)  we  are  actually  reducing  the 
problem  to  the  motion  of  a  perfectly  flexible  wire  in  a  plane.  This 
limitation  arised  naturally  from  applications  which  will  not  be  discussed 
here.  The  elastic  properties  of  a  membrane  which  satisfies  (a),  (b)  and 
(c)  can  be  described  by  a  single  stress -strain  function  a(e).  In  view 
of  the  applications  this  function  was  assumed  as  general  as  possible.  Of 
particular  interest  was  a  history  dependent  stress-strain  function  which 
is  a  combination  of  Hooke’s  linear  and  Bell's  parabolic  laws.  An  example 
of  such  a  function  is  shown  in  Figure  1.  (The  quantity  A  in  a  =  AV/e~  is 
equal  to  a(l-T/Tm),  where  a  is  essentially  constant,  T  is  the  temperature 

of  the  material  and  T^  is  its  melting  temperature.  For  details  see  Ref.  1.) 
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One  consequence  of  the  dependence  of  a(e)  on  the  stress  history 
is  that  in  general  for  each  element  of  the  membrane  we  have  a  different 
c(e)  at  any  given  time,  even  if  the  constants  A  and  E  are  the  same  for 
all  elements.  Hence  if  the  problem  is  solved  under  these  conditions, 
the  inclusion  of  more  complicated  stress-strain  relationships  requires 
only  trivial  changes  of  the  computing  process.  (More  complicated  laws 
may  be  assumed  if  the  membrane  is  combined  of  parts  of  different 
materials  or  heated  during  the  motion.) 

Typical  loads  on  the  membrane  are  time  dependent  pressures.  The 
computation  process  presented  here  can  be  used,  however,  for  the 
treatment  of  more  general  loads,  too.  Such  loads  may  be  needed,  for 
instance,  to  obtain  the  motion  of  a  membrane  (or  wire)  hit  by  a  projectile. 


2.  EQUATIONS  OF  MOTION 

We  are  mainly  interested  in  the  motion  of  a  membrane  which  is  fixed 
at  two  boundaries  and  flat  at  the  beginning  of  the  motion.  We  chose, 
therefore,  a  coordinate  system  adapted  to  that  situation.  (This  choice 
does  of  course  not  exclude  other  initial  configurations.)  We  assume 
that  the  fixed  boundaries  are  perpendicular  to  the  x,y-plane  and  pass 
through  the  x-axis  at  x  =  a  and  x  =  b.  If  we  think  of  the  problem  in 
terms  of  an  extensible  wire,  then  the  wire  occupies  at  rest  the  segment 
of  the  x-axis  between  x  =  a  and  x  =  b.  Loads  on  the  wire  are  within 
the  x,y-plane  only.  The  deformation  of  the  membrane  can  be  described 
using  this  coordinate  system  by  two  functions,  u(x,t)  and  v(x,t)  as 
shown  in  Figure  2.  Thus,  a  point  of  the  membrane  which  is  at  time  zero 


located  at  x  =  x,  y  =  0,  has  at  time  t  the  coordinates 
x  =  x  +  u(x,t),  y  =  v(x,t). 

Denoting  time  derivatives  by  dots  and  derivatives  with  respect  to 
x  by  apostrophes  we  obtain  the  following  equations  of  motion: 
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u  =  —  (a>cos  ep) *  -  — "s'  *sin  0, 
o'  T  0  ■  c 

vo  Ko 

(1) 

v  =  —  (a*sin  cp) '  +  -2 —  -s'cos  0, 

P  '  '  '  p  •  c  7 

0  0 

(2) 

s'  =  \J  ( 1+U '  )2  +  v'  2  , 

(5) 

sin  cp  =  v'  /s'  , 

w 

cos  cp  =  ( 1+u'  )  /s '  , 

(5) 

2u'  +  u'2  +  v'2 

1  +  S'  +  Vs  ’ 

(6) 

where 


p  (x)  is  the  density  of  the  membrane  at  the  beginning  of  the  motion, 

c  is  the  thickness  of  the  membrane, 

g  is  the  strain  in  the  membrane  when  it  rests  between  x  =  a  and  x  =  b, 

o 

p(x,t)  is  a  pressure  type  load  acting  in  direction  0.  (For  normal 
pressure  3=9.) 

A  detailed  derivation  and  discussion  of  these  equations  is  given  in 
Ref.  2. 


In  addition  to  these  equations  we  assume  that  a  stress-strain 
relationship  a(e)  of  the  type  discussed  in  Section  1  is  given. 


Equations  (l)  and  (2)  can  be  replaced  by  the  following  quasi - 
linear  Equations  (l*)  and  (2*),  respectively: 

u  =  u"  —  [— -  •  sin2cp  +  (l+eQ)  cos2cp]  + 

(1*) 

+  v"  r-  [■  f  +  (1+eo}  If] sin  9'cos  9  ■  p^c s'  *sin 

Ho  0 

v  = u"  r  ["  f  +  (1+eo}  If  • sin  cp,cos  *] + 

0  (2*) 

+  V"  0~  [b?  C°S2CP  +  ^+G0^  if  Sln2cP]  +  p^  S'C°S 

^O  o 
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In  order  to  simplify  the  notation  let  w  be  a  vector  function. 

(in  the  present  case  w  =  (^).)  The  equations  of  motion  are  then  of  the 
type 

w  =  f(x,t,w,w,w' ,w").  (7) 

The  slopes  a  of  the  characteristics  of  (7)  in  the  x;t-plane  satisfy 
the  equation 

det  (a'TD-l)  =  0,  (8) 

where 

D  =  df/dw"  (9) 

is  the  Jacobian  of  f  and  I  is  the  unity  matrix.  If  all  roots  of 
Equation  (8)  are  real  we  call  Equation  (j)  totally  hyperbolic 

(Ref  3,  P.175). 

The  conditions  for  total  hyperbolicity  of  Equation  (7)  can  be 
expressed  in  case  of  two-dimensional  w  conveniently  in  terms  of  the 
trace  ®  and  determinant  A  of  the  Jacobian  D.  In  terms  of  these  quantities 
Equation  (8)  becomes  namely 

(a2®)2  .  k.  _  a2@  +  1  =  0  (10) 

and  this  equation  has  four  real  roots  if  and  only  if 

©  ;>  0 

and  (ll) 

©2/4  ;>  A  :>  0. 

Computing  A  and  0  from  Equations  (l*)  and  (2*)  we  obtain 


with 


®  =  7(1+7), 

(12) 

^0 

A  =  ©2  .  —1 — 

1  +  7 

(15) 

a 

(1+e  )  •  s'  ■  da/de 

(14) 
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These  quantities  0  and  A  satisfy  obviously  the  conditions  (ll). 
Hence  the  equations  of  motion,  (l*:)  and  (2*),  are  totally  hyperbolic. 
The  slopes  of  the  characteristics  are 


0/ 


min 


=  I  (1  +  \fl  -  4(A/®2)  )  1  =  An/de  (15) 


and 


a 


max 


i  (i  +\fi  -  MA/e2)  )  = 


(16) 


a  .  is  the  inverse  of  the  longitudinal  wave  velocity  and  a  is  the 
min  J  max 

inverse  of  the  transversal  wave  velocity  (Ref.  L). 


3.  NUMERICAL  SOLUTION 

A  numerical  solution  of  the  equations  of  motion  of  an  extensible 
string  has  been  considered  by  Cristescu  (Ref.  L).  He  uses  a  characteristics 
method  and  presents  examples  for  a  string  with  a  linear  stress-strain  law 
(Ref.  5)-  In  this  paper  we  present  another  numerical  method  which  has 
the  advantage  of  simplicity  and  which  may  be  readily  applied  to  any 
totally  hyperbolic  second  order  system  with  two  independent  variables. 

It  does  not  require  a  transformation  of  the  equations  in  characteristics 
form  nor  a  transcription  in  form  of  a  system  of  first  order  equations. 

A  new  feature  of  the  numerical  scheme  is  the  employment  of  a  two  level 
finite  difference  scheme  for  a  system  of  second  order  equations.  This 
has  the  advantage  that  the  sizes  of  the  time  steps  can  be  changed  easily 
during  the  computation  process,  thus  reducing  computing  time  and  enhancing 
convergence  and  accuracy  of  the  method. 

We  now  give  a  short  description  of  the  numerical  solution  method. 

A  detailed  description  and  analysis  of  the  method  is  given  in  Ref.  6. 

The  solution  method  is  based  on  the  following  interpolation  formulae 
for  the  vector  functions  w  and  w: 


w(x,t+k) 


+ 


1 

l 


=  w(x,t)  +  kw(x,t)  +  j  k2  w(x,t) 
k2  w(x,t+k)  -  ^  k^  w(x,t+9k), 


+ 


(17) 


with  0  ^  9  s  1,  and 

w(x,t+k)  =  w(x,t)  +  ^  k»w(x,t)  + 
+  |  k  w(x,t+k)  -  J2  ^  w(x,t+§k). 


(18) 
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with  0^9^ 

For  the 
rectangular  c 
defined  by 


1. 

numerical  realization  of  these  formulae  we  establish  a 

omputing  grid  in  the  x,t -plane  with  grid  points  (x.,t.) 

a  J 

h  =  (b-a)/n. 


=  a  +  i-h,  (i=0,l,2, . . . ,n) 


(19) 


and 


t 

o 


=  0 


t . 
J 


r=l 


(j=l,2,...). 


(20) 


This  grid  is  equidistant  in  the  x-direction  and  it  has  variable  time 
steps  kr.  The  step  we  chose  such  that  the  estimated  domain  of 

dependence  for  every  point  (x^t  ,  i=l,2, . . .  ,n-l  is  located  for 


t=t  between  the  points  (x.  .  ,t  )  and  (x.  ,,.t  ). 
r  *  v  i-l'  r '  v  l+l’  r' 

the  grid  points  we  denote  by  corresponding  indices 

w(x. ,t  )  =  w.  .  etc. 
l  r  l;r 


Values  computed  at 
,  i.e.,  w(xi,tj>)  =  w_^ 


The  computing  step  which  furnishes  values  of  w  and  w  at  the  grid 
points  with  t  =  tr+^  from  those  at  t  =  t  is  a  predictor-corrector 

type  algorithm  and  may  be  described  by  the  following  four  substeps. 


Sub step  1 


For  t  =  we  obtain  approximations  to  the  space  derivatives 


V.  =  isr  (W..,  -  W.  -  ), 

i,r  2h  l+l, r  x-l,r  ’ 


W’.'  =  (W.^,  -  2W.  +  W.  ,  ), 

i,r  h2  i+l,  r  x,r  i-l,r' 


(21) 


Using  these  approximations  and  Equation  (7)  we  compute  the  approximations 
$i  We  also  compute  with  Equation  (15)  a  ^  every  grid  point 

(xi;tr),  i=l,  ...,n-l  and  establish  the  new  time  interval  by 


r+1 


h 


Min 

1  s  i  <  n-l 


a  .  (x.,t  ) 

mm  i’  r 


(22) 
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Substep  2  ("Predictor"). 


We  compute  by  extrapolation  approximations  to  and  r+^: 


*W 

*W 


i,r+l 

i,r+l 


W,  +  k  X  +  k  k2xnW,  , 
i,r  r+1  i,r  2  r+1  i,r 


W.  +  k  ^W.  . 

1 , r  r+1  l , r 


(23) 


Substep  3« 

We  compute  approximations  to  the  space  derivatives  for  t  =  t  , 
using  the  latest  approximations  of  W^ 


*W'  =  —  (*\j  _  *w  ) 

i,r+l  2h  ^  i+l,r+l  i-l,r+l'> 


*W" 


(X 


i,r+l  h2  v  i+l,r+l 


2  *W  ._  +  *W.  .  ._). 

i,r+l  l-l, r+1' 


(24) 


Corresponding  *W^  r+1  are  then  computed  with  Equation  (7)  using  these 
values . 

Sub step  4  ("Corrector"). 


The  approximations  *W^  ^+1  and  *Wi  r+1  are  corrected  by  computing 


new  approximations: 

W 


,  ,,  =  *W.  +  \  k2  (X  -  W,  ), 

i,r+l  i,r+l  6  r+1  '  i,r+l  i,r  ’ 


W,  =  *W,  +  \  k  (*W  .  -  W,  ). 

i,r+l  i,r+l  2  r  '  i,r+l  i,r 


(25) 


The  boundary  conditions  enter  the  computations  at  appropriate  substeps 
for  i  =  0  and  i  =  n. 

If  the  computations  stop  at  Substep  4,  we  have  a  predictor-corrector 
type  computing  scheme.  A  possible  variation  of  the  scheme  can  be  obtained 
by  an  iteration  between  Substeps  3  and  4.  This  amounts  to  an  iterative 
solution  of  Equations  (17)  and  (l8)  without  the  remainder  terms. 

It  can  be  shown  (Ref .  6)  that  von  Neumann  conditions  for  local 
stability  are  satisfied  for  this  computing  scheme  if  the  system  of 
Equation  (7)  is  totally  hyperbolic,  f  differentiable  with  respect  to  all 
its  arguments  and 


k/hZ/3-  min  ))■  (26) 

a  <  x  <  b 
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Hence,  by  chosing  the  time  interval  in  accordance  with  the  domain  of 

dependence,  i.e.,  Equation  (22),  we  remain  well  within  this  von  Neumann 
bound. 

In  the  present  two-dimensional  case  we  can  express  the  stability 
and  hyperbolicity  conditions  in  terms  of  the  determinant  A  and  the  trace 
0  of  the  Jacobian  D.  The  results  are  shown  in  Figure  3- 


Figure  3-  Local  stability  of  computing  scheme 

The  order  of  the  computing  scheme  is  L  for  W  and  3  for  W  and  it 
converges  to  the  solution  with  refinement  of  the  grid  if  the  solution 
is  smooth.  The  replacing  of  the  predictor-corrector  scheme  by  an 
iterative  process  reduces  some  error  terms  but  does  not  increase  the 
order  of  the  approximation.  Hence  a  general  improvement  of  the  solutions 
by  iteration  cannot  be  expected.  -  The  computing  scheme  is  not  suitable 
for  the  treatment  of  motions  with  shock  phenomena. 


b.  EXAMPLES 

The  numerical  solution  method  outlined  in  the  previous  section 
is  a  finite  difference  method.  It  requires,  therefore,  a  discretization 
of  the  equations  of  motion.  The  discretized  equations  might  not  be  an 
adequate  description  of  the  continuous  mechanical  model  for  which  the 
original  equations  of  motion  are  derived.  In  such  cases  we  may  try  to 
modify  the  discretization  process.  The  adequacy  of  the  final  equations 
can  be  established,  for  instance,  if  we  can  find  a  mechanical  model  for 
the  discrete  equations.  If  that  model  is  sufficiently  simple  it  can  be 
of  great  help  for  the  understanding  of  the  behavior  of  the  solutions.  We 
illustrate  this  situation  by  considering  the  stresses  and  strains  for  a 
special  deformation  of  the  membrane. 
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Assume  that  for  t  -  0  the  membrane  is  stress  free  (e  =  0)  and 

o 

exposed  to  a  load  as  shown  in  Figure  4a.  At  time  t  =  t^  the  load  may  be 

zero  and  the  deformation  as  in  Figure  4b.  It  is  obvious  that  in  this 
situation  the  point  x  =  3  of  the  membrane  will  have  a  positive 
acceleration  (for  t  =  t^).  An  inspection  of  Equation  (2*)  shows  that 

indeed  for  t  =  t^ 

Vj  =  (Vj  o5)/(pQs5), 

that  is,  v^  >  0  if  >  0.  Hence  necessary  for  v^  >  0  is  >  0.  (In 
case  of  a  linear  stress-strain  law  a  =  Ee,  e,  >  0  is  also  sufficient  for 
positive  acceleration.)  Since  e  is  computed  from  Equation  (6)  with  v^  =  0 
and  >  0,  we  have  simply 

e3  =  U3  >  °' 

as  expected. 

For  the  discretized  problem  we  use  u-  and  v-values  at  the  grid 
points  only.  Because  at  these  points  ^  =  0,  we  obtain  u^  =  0  and 

e_  =  0.  (The  other  e,  are  shown  in  Figure  4c.)  Hence  if  the  numerical 
3  i 

scheme  is  applied  to  the  deformation  in  Figure  4b  we  obtain  v^  =  0,  which 

is  obviously  not  correct.  This  demonstrates  that  the  formally  discretized 
equations  of  motion  do  not  describe  adequately  the  membrane  model. 


Searching  for  a  more  adequate  numerical  algorithm  we  may  change  the 
computation  process  of  e.  Instead  of  using  Equation  (6)  we  compute  the 
deformations  between  the  grid  points  k  and  k  +  1  by 

ek,k+l  =  ^sk,k+l  "  h^h» 


where  s,  ,  ,  is  the  distance  between  the  points  k  and  k  +  1  (see  Figure  4e) 

K  y  K+I 

The  strains  at  the  grid  points,  which  are  needed  to  evaluate  the  right 
hand  sides  of  Equations  (l*)  and  (2*),  we  can  then  compute  by  averaging: 


ei  "  2  (®i-l,i  +  ei,i+l^ ' 

The  resulting  are  shown  in  Figure  4d.  Obviously  these  values  are 

better  than  those  of  Figure  4c,  showing  a  qualitatively  correct  behavior 
for  the  deformation  considered.  However,  objections  can  be  raised  against 
an  averaging  in  case  of  a  general  stress-strain  law.  In  the  mechanical 
model  this  may  amount  to  an  averaging  between  materials  in  elastic  and 
plastic  status.  Computationally  we  may  average  between  two  different 
branches  of  the  function  a(e). 
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Fig.  V.  Strain  computations. 
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The  last  considerations  suggest  another  possibility  to  handle  the 
problem:  we  compute  strains  and  stresses  between  the  grid  points  as 
above,  but  do  not  asign  specific  values  of  a  at  the  grid  points. 

(Figure  4f)  This  approach  has  a  simple  mechanical  model,  namely  a  chain 
consisting  of  mass  free  extensible  and  straight  links  and  mass  points 
attached  to  the  hinges  at  the  grid  points.  (Figure  4e)  Computationally 
this  approach  amounts  to  a  discretization  of  Equations  (l)  and  (2)  instead 
of  Equations  (l*)  and  (2*),  respectively. 

All  three  methods  for  the  computation  of  e  and  a  were  tried  out  in 
the  examples  shown  later.  The  first  and  second  method  proved  to  be 
numerically  unstable.  The  instabilities  appeared  first  at  places  where 
an  unloading  of  the  membrane  took  place,  that  is,  where  a(e)  ceased  to 
be  a  unique  function.  The  computations  for  the  chain  model  had  no 
instabilities.  All  three  models  furnished  practically  identical  results 
before  unloading. 

The  first  example  of  the  computations  deals  with  the  dynamic  response 
of  a  membrane  to  a  high  pressure  load  of  short  duration,  as  shown  in 
Figure  5.  The  membrane  was  assumed  20cm  wide,  1mm  thick  and  having  the 


following  material  constants: 

Density  p  =2.7  g/crn^; 

o  2 

Linear  elasticity  modulus  E  =  7200  kg/mm  : 

Parabolic  elasticity  modulus  A  =  40  kg/mm  . 

(These  values  may  correspond  to  an  aluminum  aloy.)  The  initial  strain 
e  was  assumed  zero.  The  initial  pressure  p  was  assumed  to  be  100 
o  2  0 

technical  atmospheres  (=  100  kg/cm  ). 

The  deformations  at  various  time  instants  are  shown  in  Figures  6,  7 
and  8  together  with  the  corresponding  stress  and  strain  profiles.  It  is 
interesting  to  note  that  the  latter  profiles  are  almost  constant  for  any 
fixed  time.  The  reason  for  this  is,  that  the  stresses,  induced  at  the 
boundaries  of  the  membrane  are  carried  by  fast  longitudinal  waves  and 
increased  by  relatively  small  steps  as  these  waves  travel  back  and  forth 
across  the  membrane.  The  transversal  waves  do  not  carry  any  larger 
amount  of  stresses  in  this  case.  A  practical  consequence  of  this 
behavior  of  the  membrane  is  that  a  failure  can  occur  at  any  place  and  a 
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rupture  is  not  bound  to  take  place  at  the  boundaries  or  at  the  point  of 
symmetry. 

As  mentioned  in  Section  3  the  numerical  scheme  uses  variable  time 
steps,  depending  on  the  slope  of  characteristics.  In  the  example 
presented  this  slope  changes  as  the  membrane  expands  because  of  the 
non-linear  stress-strain  law.  The  time  intervals  are  changed  correspondingly 
and  their  sizes  displayed  in  Figure  9>  where  the  size  of  every  tenth 
time  step  is  plotted  over  the  corresponding  time  value.  Note,  that  at  the 
beginning  of  the  calculations  the  time  intervals  remain  constant,  which 
indicates  that  some  parts  of  the  membrane  are  still  within  the  linear 
part  of  a(e)  (see  Figure  l) .  After  all  elements  of  the  membrane  have 
reached  the  parabolic  region  of  c(e),  the  time  intervals  are  increased 
in  accordance  with  Equations  (15)  and  (22).  This  increase  up  to  a  whole 
order  of  magnitude  results  in  an  essential  saving  of  computing  time.  At 

about  t  =  6.3*10  sec.  unloading  at  some  parts  of  the  membrane  starts. 

The  corresponding  stress-strain  law  becomes  linear  again  (Figure  l)  and 
consequently  the  time  interval  drops  to  its  original  value . 

Another  example  of  the  computations  is  shown  in  Figures  10,  11  and 
12.  The  same  membrane  as  in  the  previous  example  is  hit  by  a  projectile 

2 

at  an  angle  of  45°.  The  projectile's  cross  section  is  assumed  25mm  ,  its 

density  7-8  g/cv?  and  its  velocity  500  m/sec.  The  load  on  the  membrane 
is  in  this  case  implemented  by  computing  an  initial  velocity  of  those 
mass  points  which  are  hit  by  the  projectile,  assuming  thereby  a  plastic 
collision.  The  mass  of  the  points  involved  in  the  collision  is  increased 
for  subsequent  calculations  by  adding  the  mass  of  the  projectile,  (in 
the  case  displayed  in  the  Figures  the  number  of  grid  points  was  200, 
i.e.,  they  were  originally  1mm  apart  from  each  other.  Consequently  it  was 
assumed  that  two  adjacent  points  were  hit  simultaneously  by  the  projectile 
and  to  each  of  these  mass  points  one  half  of  the  projectiles  mass  was 
added.)  Figures  10,  11  and  12  show  the  positions  of  the  membrane  and  the 
corresponding  stresses  and  strains  at  various  instants.  As  in  the  previous 
example  considerable  strains  are  found  in  the  membrane  already  before  the 
arrival  of  the  transversal  wave.  Due  to  the  nature  of  the  load  maximal 
stresses  are  in  this  case  obtained  adjacent  to  the  impact  zone  and  at  the 
boundaries  of  the  membrane.  The  critical  instants  for  a  possible  break 
of  the  membrane  are  for  the  impact  zone  immediately  after  the  impact  and 
for  the  boundaries  the  arrival  times  of  the  transversal  waves.  (For 

-4 

the  left  hand  boundary  of  about  3*5*10  sec.  and  for  the  right  hand 

-4  \ 

boundary  at  about  7*10  sec.)  However,  as  the  permanent  deformations 
in  Figure  12,  last  picture,  show,  the  maxima  are  not  very  pronounced 
and  failure  might  be  possible  anywhere  within  the  left  half  of  the 
membrane . 
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SUMMARY 


A  novel  numerical  treatment  of  the  dynamic  response  of  a  membrane 
has  been  presented.  The  method  employs  a  chain  model  and  furnishes  all 
data  necessary  for  a  numerical  analysis  of  the  physical  phenomena.  It 
can  be  applied  to  membranes  with  arbitrary  stress-strain  laws. 
Experimental  verification  of  the  validity  of  the  model  for  practical 
problems  is  needed. 
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Fig.  8.  Blast  load . 


Fig.  10.  Oblique  impact 
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ABSTRACT 

In  this  paper  a  method  of  numerical  Integration  of  the  time- 
dependent  Navler-Stokes  equations  for  two-dimensional  flows  Is  con¬ 
sidered.  The  method  Is  restricted  to  classes  of  motion  defined  In 
a  rectangular  domain.  However,  by  use  of  a  suitable  transformation  this 
Includes  a  large  class  of  problems  describing  the  flow  past  cylinders 
in  an  otherwise  unbounded  field.  The  formulation  of  the  problem  in 
terms  of  the  stream  function  and  vorticity  is  adopted.  The  main  object 
of  the  paper  is  to  show  that  the  boundary-value  problem  which  must  be 
solved  to  determine  the  stream  function  can  be  replaced  by  a  spatial 

I 

step-by-step  integration.  Some  numerical  illustrations  of  the  methods 
used  are  given.  These  Include  some  results  for  the  flow  past  a  circular 
cylinder. 

*  Sponsored  in  part  by  the  Mathematics  Research  Center,  U.S.  Army, 
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the  report. 

**  On  leave  of  absence  from  the  University  of  Western  Ontario. 


117 


INTRODUCTION 


In  recent  times  there  has  been  touch  interest  in  the  solution  of  the 
tine-dependent  Navier-Stokes  equations.  One  reason  is  that,  starting 
from  some  initial  state,  if  a  stable  integration  procedure  Is  chosen  and 
the  integrations  carried  on  for  sufficiently  long  time,  a  steady  state 
solution  may  be  approached.  Also,  time-dependent  integrations  may  be 
used  to  predict  flow  configurations  which  are  basically  unsteady  for  all 
values  of  the  time,  no  matter  how  large.  The  mathematics  of  both  types 
of  problems  is  essentially  the  same  and  may  be  described  as  follows. 

Assuming  incompressibility  of  the  fluid,  the  Navier-Stokes  equations 
can  be  written  in  the  usual  dimensionless  form 

3v/3t  +  (v.^)v  ■  -grad  p  +  R  ^V2^,  (1) 

where  and  p  are  respectively  the  dimensionless  velocity  vector  and  the 
pressure,  t  is  the  time,  and  R  the  Reynolds  number.  For  two-dimensional 
configurations  in  the  (x,y)-plane,  the  equation  of  continuity  div^v  -  0 
can  be  satisfied  by  introducing  the  dimensionless  stream  function 
<|>(x,y,t) ,  related  to  the  velocity  components  (u,v)  by  the  equations 

u(x,y,t)  -  3*/3y,  v(x,y,t)  -  -3ifi/3x  .  (2) 

o 

Also,  the  equation  which  results  from  eliminating  the  pressure  from  (1) 
can  be  expressed  in  terms  of  the  dimensionless  scalar  vortlcity  c(x,y,t), 
viz.  the  nonzero  component  of  curl ^v  ■  (0,0,c).  Thus  it  is  found  that 
the  equations  governing  these  two  quantities  are 


72^  +  t  ■  0 

(3) 

+  it .  R_1y2r 

3t  3y  3x  3x  3y 

(4) 

where  V2  ■  32/3x2  +  32/3y2.  The  necessary  boundary  conditions  are  as 


118 


n 


follows.  Where  the  fluid  Is  in  contact  with  a  solid  boundary  designated 
by  the  curve  C, 

ip  and  3i{//3n  are  known  on  C,  (5) 

where  3/3n  is  differentiation  normal  to  C.  If  the  flow  is  enclosed,  as 
in  the  case  of  convection  in  a  closed  cell,  these  are  the  only  conditions. 
If  it  is  open,  as  in  the  case  of  flow  past  a  cylinder  in  an  unbounded 
fluid,  conditions  at  infinity  must  be  imposed.  Generally  these  reduce  to 
the  fact  that  the  asymptotic  behavior  of  both  \p  and  A  is  known  as  infinity 
is  approached.  However,  the  important  point  in  both  problems  is  that  two 
conditions  are  prescribed  for  on  C  and  none  for  A. 

This  latter  feature  tends  to  make  the  integration  procedure  more 
difficult,  or  slower,  than  it  might  otherwise  be.  The  source  of  slowness 
is  that  the  solution  of  the  Poisson-type  equation  (3)  is  a  boundary-value 
problem  for  a  given  step  in  time  and  has  to  be  solved  by  time-consuming 
boundary-value  techniques,  usually  iterative  methods.  The  solution  of 
equation  (A)  on  the  other  hand,  is  a  marching  (or  step-by-step)  problem 
in  time.  An  integration  of  the  pair  of  equations  proceeds  as  follows. 

To  fix  ideas  consider  the  open  field  type  of  problem  in  which  the  domain 
of  the  solution  is  a  region  D  between  a  closed  contour  C  on  which  the 
conditions  (5)  hold  and  an  outer  contour  ,  on  which  boundary  conditions 
for  both  C,  and  ip  may  be  supposed  known.  Suppose  at  time  t  both  A  and  \p 
are  known  at  all  points  of  a  finite-difference  grid  covering  D  and  also 
at  all  necessary  points  of  C  and  C^.  Then  the  integration  through  time 
t  +  At  consists  of  the  successive  operations: 

(i)  By  forward  integration  of  equation  (A),  A(x,y,t  +  At)  is  determined 
from  c(x,y,t)  at  all  grid  points  in  D  and  on  ,  but  not  on  C  since  no 
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condition  for  c  Is  known  there. 

(11)  From  c(x,y,t  +  At)  at  grid  points  within  D,  i|>(x,y,t  +  At)  is 
determined  from  equation  (3)  as  a  boundary-value  problem  with  i|>  known  on 
C  and  C.. 

(ill)  The  (as  yet  unused)  condition  for  3iJ>/3n  on  C  is  now  used  to  calculate 
C(x,y,t  +  At)  on  C  from  computed  values  of  iMx,y,t  +  At)  at  grid  points  in 
the  neighborhood  of  C.  The  solution  at  time  t  +  At  is  thus  completed 
everywhere.  Successive  applications  of  the  whole  process  determines  the 
solution  at  any  subsequent  time. 

The  main  object  of  the  present  paper  is  to  attempt  to  dispense  with 
the  boundary-value  problem  in  step  (ii)  of  the  above  procedure  in  certain 
classes  of  problem  in  which  the  domain  of  integration  is  a  rectangular 
region.  The  open  field  type  of  problem  mentioned  above  falls  into  such  a 
class,  for  by  a  suitable  conformal  transformation  the  domain  of  the  two- 
dimensional  flow  past  cylinders  of  various  shapes  can  be  mapped  on  to  a 
semi-infinite  rectangle.  It  will  be  shown  that  for  this  problem  we  can 
replace  the  boundary-value  problem  in  step  (ii)  of  the  above  procedure 
by  a  step-by-step  integration  which  permits  the  stream  function  to  be 
determined  by  direct  methods.  In  fact,  the  solution  of  equation  (3)  is 
re-formulated  as  a  step-by-step  problem  in  a  single  space  variable. 

The  basis  of  the  method  for  the  open  field  problem  is  the  following. 
Suppose  step  (i)  of  the  above  procedure  has  been  carried  out.  It  is 
shown  that  certain  necessary  conditions  involving  integrals  of  c(x,y,t  +  At) 
evaluated  throughout  the  region  D  must  at  this  stage  be  satisfied.  The 
satisfaction  of  these  conditions  allows  C(x,y,t  +  At)  to  be  determined 
cn  the  boundary  C  entirely  from  the  internal  solution  for  c(x,y,t  +  At) 
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and  without  any  determination  of  the  internal  solution  for  ij>(x,y,t  +  At). 
The  solution  for  5(x,y,t  +  At)  is  then  complete.  Since  this  function  is 
now  known  on  C  as  well  as  in  D,  it  is  now  possible  to  integrate  equation 
(3)  subject  to  the  conditions  (5)  by  a  step-by-step  procedure  in  space, 
since  clearly  sufficient  boundary  conditions  are  given  to  formulate  a 
step-by-step  approach.  This  procedure  determines  iKx,y,t  +  At)  on  as 
part  of  the  solution  and  this  must  come  out  in  accordance  with  the  known 
boundary  condition  for  ii  on  C^,  giving  an  adequate  check  on  the  procedure. 

In  the  latter  sense  the  method  is  useful  in  the  open  field  problem  because, 
in  fact,  the  precise  boundary  condition  for  i|»  on  at  a  given  time  is  not 
an  easy  one  to  specify.  This  difficulty,  together  with  the  objective  of 
reducing  computer  time  in  time-dependent  studies  of  the  Navier-fitokes 
equations,  has  been  the  main  motivation  for  the  investigation.  However, 
it  is  not  yet  known  how  far  the  latter  objective  has  been  achieved. 

Although  major  attention  is  given  to  the  open  field  problem  of  flow 
past  a  cylinder,  the  method  is  not  restricted  to  this  problem.  As  an 
illustration  another  type  of  problem,  that  of  convection  in  a  closed 
rectangular  cell,  in  considered.  Admittedly  the  method  is  less  satisfactory 
here  although  the  same  objective,  construction  of  a  step-by-step  integration 
of  equation  (3),  is  achieved.  The  two  classes  of  problem  are,  however, 
very  different  in  character  and  illustrate  different  features  of  the 
general  method.  In  both  classes  the  procedure  is  based  on  the  reduction 
of  equation  (3)  to  a  set  of  ordinary  differential  equations  in  one  space 
variable.  This  is  done  by  standard  Fourier  analysis.  The  resulting 
ordinary  differential  equations  are  then  solved  by  step-by-step  methods. 

In  the  open  field  problem  the  integration  range  is,  in  theory,  infinite. 

In  the  closed  cell  problem  it  is  finite.  In  both  cases  a  special  method 
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has  to  be  constructed  for  solving  these  differential  equations,  owing  to 
the  nature  of  their  solutions.  Some  difficulty  might  indeed  be  expected 
if  one  attempts  to  solve  a  Poisson  equation,  essentially  of  elliptic  type, 
by  step-by-step  methods.  The  special  technique  used  for  the  solution  is 
explained  arid  illustrated  by  a  numerical  example  over  a  finite  range  of 
integration. 

As  an  illustration  of  the  open  field  problem,  some  results  for  the 
time-dependent  Integration  of  the  Navier-Stokes  equations  for  flow  past  a 
circular  cylinder  at  Reynolds  numbers  7,  10,  20,  and  40  are  given.  Some 
of  the  results  have  been  computed  on  the  CDC  3600  at  the  University  of 
Wisconsin  and  some  at  the  University  of  Western  Ontario.  Actually,  only 
details  of  the  final  steady-state  solutions  are  given  here,  the  main 
object  being  to  compare  them  with  known  results  in  the  literature.  The 
comparison  is  good  on  the  whole,  indicating  that  the  method  gives 
satisfactory  results. 

PROBLEMS  CONSIDERED  AND  REPRESENTATION  BY  DIFFERENCE  EQUATIONS 

A  typical  problem  of  flow  in  a  closed  rectangular  cavity  is  indicated 
in  Figure  1.  At  time  t  <  0  the  walls  OX,  XY,  YZ  and  ZO  of  the  cavity  are 
at  rest  and  there  is  no  flow.  At  time  t  »  0  the  upper  wall  YZ  is  moved 
with  constant  velocity  u  ■  -1.  All  quantities  are  assumed  to  be  dimension¬ 
less.  The  Reynolds  number  can  be  based  on  the  actual  length  of  one  of  the 
walls,  say  d,  and  the  magnitude  of  the  actual  velocity  of  the  upper  wall 
U.  Thus  R  ■  Ud/v,  where  v  is  the  coefficient  of  kinematical  viscosity. 

The  boundary  C  mentioned  in  the  introduction  is  the  boundary  of  the 
rectangle  and  the  domain  D  its  interior. 

Boundary  conditions  for  the  stream  function  are  obtained  from  the 
equations  (2).  Thus  for  t  <  0,  4i(x,y,t)  ■  0  within  D  and  on  its  boundary 


C.  For  t  v  0  we  have 


On  OX:  ip  ■  3iJ>/3y  -  0 
On  XY:  ip  ■  3ip/3x  »  0 
On  YZ:  tp  -  0,  3ip/3y  -  -1 
On  ZO:  \p  ■  3ip/3x  ■  0  . 

If  the  solution  is  started  from  the  initial  state  of  rest  it  may  be  con¬ 
tinued  indefinitely  subject  to  the  conditions  (6).  For  large  enough  time 
the  solution  will  generally  tend  to  a  steady  state  in  which  all  quantities 
are  independent  of  time.  Equations  (6)  give  the  boundary  conditions  for 
the  steady  problem.  Solutions  to  the  time-dependent  and  steady  problems 
for  various  rectangles,  using  numerical  methods,  have  been  given  in  the 
literature.  The  time-dependent  problem  has  been  considered  by  Greenspan, 
Jain,  Manohar,  Noble  and  Sakurai[l],  Steady-state  solutions  have  been 
given  by  Kawaguti[2],  Simuni[3],  Mills [4]  and  Burggraf  [5],  Both  types  of 
problems  have  recently  been  considered  by  Zabransky  6  . 

Next  consider  the  open  field  problem.  A  typical  problem  is  that  of 
a  cylinder  of  infinite  length  initially  at  rest  in  an  infinite,  motionless, 
fluid.  At  time  t  =  0  it  starts  to  move,  and  subsequently  continues  to 
move,  with  constant  velocity  in  a  direction  perpendicular  to  its  axis. 

The  fluid  at  large  enough  distances  is  assumed  to  remain  undisturbed. 
Actually  we  shall  adopt  the  following  equivalent  formulation.  At  t  <  0, 
the  cylinder  and  fluid  are  moving  together  with  velocity  u  ■  1  parallel  to 
the  fixed  x-axis.  At  t  *  0  the  cylinder  is  brought  to  rest  and  subsequently 
remains  at  rest.  The  fluid  at  large  enough  distances  from  the  cylinder 
is  assumed  to  remain  undisturbed  with  velocity  u  =  1  for  all  t.  If  C 
denotes  the  contour  of  the  cylinder  and  D  the  infinite  domain  outside  it 
then,  with  regard  to  fixed  rectangular  axes  with  origin  inside  C,  the 
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boundary  conditions  for  the  stream  functions  are 


t  <0:  ^  *  y  throughout  D 

t  >  0:  \l>  ■  3iji/3n  -  0  on  C  r  (7) 

3i{i/3y  >  1,  3tJ>/3x  -»•  0,  as  x2  +  y2  -*• 

We  shall  restrict  ourselves  to  flows  which  are  symmetrical  about  the 
x-axls.  Thus  In  practice  D  can  be  limited  to  the  region  of  the  upper 
half-plane  outside  the  upper  half  of  C,  viz.  the  region  above  GHIJK  In 
Figure  2.  Here  HIJ  Is  the  cylinder  and  G,K  are  points  at  Infinity  on  the 
x-axls.  Consider  also  the  curvilinear  coordinate  system  (a,B)  shown  In 
Figure  2.  This  Is  supposed  to  be  derived  from  the  Cartesian  system  by  a 
conformal  transformation  of  the  type 

a  +  IB  -  F(x  +  ly).  (8) 

The  transformaMon  is  chosen  so  that  the  cylinder  HIJ  corresponds  to 
a  -  a*  and  the  parts  JK,  GH  of  the  x-axis  correspond  to  B  -  0,  6  ■  n 
respectively.  The  region  D  above  GHIJK  therefore  corresponds  to  the 
interior  of  the  semi-infinite  rectangle  shown  in  Figure  3. 

It  Is  easily  shown  that  under  the  conformal  transformation  (8)  equation 
(3)  becomes 

V12if>  +  c/H2  -  0  (9) 

and  equation  (A)  becomes 


3? 

Dt 


H2(R-1V12C 


li  lit  JL£n 

3a  36  "  36  3a; 


(10) 


Here 


7X2  -  32/3a2  +  32 / 3 62 


and 


H2  ■  (3a/3x)2  +  (3a/3y)2  ■  (36/3x)2  +  (36/3y)2. 


The  use  of  transformations  of  this  kind  is  well  established  in  the 
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literature.  Thus,  for  example,  Payne  [7]  and  Kavaguti  and  Jain  jjij  have 
used  the  particular  case 

F(x  +  ly)  -  log (x  +  iy)  (11) 

for  which 

H2  -  e  20  (12) 

in  calculations  of  the  time-dependent  flow  past  a  circular  cylinder.  Here 
a  ■  0  corresponds  to  the  cylinder  x2  +  y2  *  1.  Apelt  and  Keller  and 
Takami  [lo]  have  used  the  same  transformation  in  calculations  of  the 
corresponding  steady  Clow  (3t;/3t  ■  0  in  equation  (10)).  The  case  of 
steady  flow  past  a  finite  flat  plate  occupying  the  region  |x|  <  1  of  the 
x-axis  has  been  considered  by  Janssen  [ll]  and  by  Dennis  and  Dunwoody  [l2^ 
using  the  particular  case 

F(x  +  iy)  ■  cosh  1  (x  +  iy) 

for  which 

H2  -  2/(cosh  2a  -  cos2B). 

In  this  case  a*  *  0  corresponds  to  the  plate  and  it  may  be  noted  that 
taking  a*  equal  to  any  positive  constant  gives  an  elliptic  cylinder  whose 
ratio  of  minor  to  major  axes  is  tanh  a  ,  Other  transformations  of  the 
same  type  can  be  employed  to  correspond  to  cylinders  of  various  other 
shapes,  e.g.  the  generalized  Joukowski  aerofoil. 

Finally,  it  remains  to  state  boundary  conditions  appropriate  to  the 
transformed  (a,8)-plane  of  Figure  3.  Obviously  the  first  condition  of  (7) 
will  depend  upon  the  particular  transformation  used.  For  the  other 
conditions  of  (7)  we  use  the  relation  between  the  derivatives 


(13) 


3x  _3i|;  _3^ 

3a  " 

3a  3x 

3a  3y 

3<P  m 

3x  3ip  3y  3iji 

3B 

38  3x 

38  3y 

125 


Provided  that  the  derivatives  of  x  and  y  with  regard  to  a  and  8  remain 
finite  at  a  ■  a*,  we  can  therefore  take 

t  0:  tp  ■  ■  0  when  a  “  a*.  (14) 

For  the  conditions  at  infinity  it  may  be  observed  that  for  both  of  the 
particular  cases  cited  above  we  can  deduce,  using  (13),  that 

t  >  0:  e  a  3i(i/3a  -*■  ksinB 

e”a  3i|)/36  -*•  kcosfi,  as  o  +  •, 

Here  k  is  a  numerical  constant  such  that  k  ■  1  for  the  circular  cylinder 
and  k  —  *s  for  the  elliptic  cylinder  or  flat  plate.  For  other  types  of 
cylinders,  e.g.  the  Joukowskl  aerofoil  already  mentioned,  the  same 
conditions  (15)  hold  and  we  shall  take  (14)  and  (15)  to  be  general  conditions 
governing  the  problem.  As  a  consequence  of  the  uniform  stream  condition 
at  infinity,  it  follows  that  for  all  time 

C  0  as  a  ■*  •.  (16) 

Lastly,  since  symmetrical  conditions  have  been  assumed,  both  i|>  and  c  must 
vanish  on  GH,  JK  (Figure  2)  for  all  time,  viz. 

<(/  ■  5  ■  0,  when  B  ■  0,it  .  (17) 

Next,  consider  the  formulation  of  these  problems  in  terms  of  finite- 
differences.  For  the  problem  of  flow  past  a  cylinder  we  must  limit  the 
senl-liflnlte  domain  in  Figure  3  by  an  artificial  boundary.  This  is 
designated  by  the  line  XY  and  it  corresponds  to  the  curve -C^  mentioned  in 
the  introduction.  For  both  problems  we  now  have  a  finite  rectangle  OXYZ 
as  the  domain  D  of  the  numerical  solution.  We  suppose  this  to  be  covered 
by  a  rectangular  grid  with  lines  parallel  to  the  sides  of  the  rectangle. 

A  typical  set  of  points  on  this  grid  is  shown  in  Figure  4.  The  spacing  in 
the  two  directions  is  not  assumed  necessarily  to  be  the  same  and  we  write 
r*  ■  h2/h^2.  Only  the  equations  (9)  and  (10)  need  be  considered.  The 
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case  H  -  1  will  then  correspond  to  equations  (3)  and  (4). 

Using  customary  central-difference  approximations  to  second  derivatives 
(see  e.g.  Fox  [l3]X  the  usual  approximation  to  equation  (9)  is 

r*(iju  +  iM  -  (2  +  2r*)  *  +  h2c  /H  2  -  0.  (18) 

1  J  i  4  o  o  o 

The  approximation  to  equation  (10)  will  depend  upon  whether  an  explicit 

or  implicit  scheme  in  time  is  adopted .  This  is  not  really  relevant  to  the 

present  paper  and  we  illustrate  with  the  simple  first-order  explicit  scheme. 

Denoting  the  right  side  of  (10)  by  L  we  have,  at  the  typical  point  0, 

C  ( t  +  At)  -  C  (t)  +  At  L  .  (19) 

O  0  o 

The  quantity  must  be  exnressed  in  differences.  For  simplicity  write 

3 ip / 3ot  -  X,  — 3ip /3 B  ■  u.  (20) 

Then  if,  for  example,  the  ^-derivatives  are  expressed  in  central  differ¬ 
ences,  we  have  as  an  approximation 

H“Vl  -  (R-1  +  hu  /2 H.  +  (P_1-  hu  /2)c_ 
o  o  o  1  o  j 

-1  -1  1  <21) 

+  r*{  (R  +  h.X  /2)t,  +  (R  -  h.X  /2)C,)-R  <2  +  2r*k  . 

Lot  104  O 

These  equations  are  sufficient  to  carry  out  the  process  described  in 
the  introduction.  Stage  (i)  can  be  carried  out  at  all  internal  grid  points 
using  (19).  At  boundaries  where  C  is  known,  e.g.  OX,  Y7-  and  XY  (using 
(16)  as  an  approximation)  in  Figure  3,  there  exists  no  problem.  It  clearlv 
cannot  be  carried  out  at  the  boundary  0Z  in  Figure  3,  or  at  any  boundary 
in  Figure  1.  Stage  (ii)  is  carried  out  by  solving  (usually  iteratively) 
the  matrix  problem  defined  by  the  equations  (18)  with  known  conditions  for 
4>  on  the  boundaries.  In  the  problem  of  closed  cell  convection  we  can  take 

ip  ■  0  on  all  boundaries.  In  the  cylinder  problem  ip  ■  0  on  all  except  XY. 

The  condition  on  XY  requires  further  consideration.  It  is  certainly 

incorrect  to  take  ■  easinB  for  the  finite  value  a  -  a  corresponding  to 
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XY.  The  simplest  possibility  seems  to  be  to  use  the  first  of  the  conditions 
(15)  as  a  slope  boundary  condition  on  XY,  but  the  only  really  satisfactory 
way  is  to  determine  the  asymptotic  nature  of  the  solution  for  large  a. 

For  the  time-dependent  problem  this  is  difficult. 

Finally,  in  stage  (ill),  ;  is  calculated  at  any  boundary  where  it  lo 
not  known.  For  example,  at  the  boundary  OZ  in  Figure  3,  the  central 
difference  approximation  to  the  condition  3<p/3a  -  0  gives,  approximately, 


Also  m  m  i>,  -  0 

and  hence  (18)  gives 

Co  -  -ZH^/h2,  (22) 

which  determines  C  at  any  point  on  this  boundary  in  terms  of  the  computed 
Internal  values  of  1 1>,  In  this  problem  this  is  the  only  boundary  at  which 
C  must  be  so  determined.  It  is  hardly  necessary  to  give  the  corresponding 
equations  for  finding  C  on  the  other  boundaries  of  Figure  1.  Thus  from 
the  given  Initial  state,  and  with  time  steps  At  chosen  according  to  some 
suitable  stability  criterion,  the  solution  of  either  problem  can  be  con¬ 
tinued  indefinitely. 

METHOD  OF  SOLUTION 

The  object  of  this  paper  is  to  consider  some  modifications  to  the 
method  of  solution  of  equation  (9).  The  cylinder  problem  is  particularly 
suitable.  Since  <|/  -  0  when  6-0  and  6  -  if  for  all  t,  we  may  assume  the 
Fourier  expansion 

00 

iKa,6,t)  m  l  f  (a,t)  sin  n6  (23) 

n-1 

satisfying  these  conditions.  Term-by-term  differentiation  with  regard  to 
8  is  justified  (see  e.g.  Jeffreys  and  Jeffreys  [ia]J).  If  this  series  is 
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substituted  In  (9)  and  we  multiply  by  the  general  term  sin  nB  and  Integrate 
from  8  ■  0  to  6  ■  it,  we  obtain  the  equations 

fn  -  "2f„  ■  V”-1’  •  <2‘> 

where  -  rir 

r  (a,t)  -  -  7  (C/H2)sln  nB  dB.  (25) 

n  "  i0 

The  equations  hold  for  n  -  1,2,3,...  and  the  primes  denote  differentiation 

with  regard  to  a,  l.e.  f*  -  3f  /3a.  Effectively  we  can  treat  these 

n  n 

equations  as  ordinary  differential  equations.  Although  the  time  Is  present 
In  the  solutions  It  enters  only  through  Its  presence  in  the  quantities 
rn(a,t).  The  Integration  of  equations  (24)  is  carried  out  at  a  fixed  time 
and  her.ce  we  can  think  of  the  solutions  as  dependent  only  on  a  at  this 
fixed  time. 

The  boundary  conditions  for  equations  (24)  follow  from  (14)  and  (15). 
For  all  values  of  n 


f  -  f' 
n  n 

-  0 

when  a  ■  a* 

(26) 

and,  as  a  -*■  <*>, 

-a, 

e  f  -*> 

M. 

e_af '  -*•  k6 

(27) 

n 

n 

n  n 

where 

‘l-1’ 

6  * 

n 

0(nM). 

These  conditions  can  now  be  used  along  with  the  equations  (24)  to  deduce 
a  necessary  condition  on  rn(a,t)  which  holds  for  all  values  of  the  time. 


Let 

pn(a,t)  -  f;  +  nfn. 

(28) 

Then 

h  -  «"nVa-t> 

and  hence 

e"nap  (a,t)  -  |  e”nzr  (z,t)dz, 

n  ia*  n 

(29) 

since  p  ■  0  when  a 
rn 

■  a*.  Now  let  a  -*■  »  in  (29).  From  (27)  we 

find  that 

ra 

e  nzr  (z,t)dz  ■  2k,  if  n  *  1 

Ja*  n 

(30) 

-  0  ,  if  n  +  1  .  _ 
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Consider  now  how  the  result  (30)  may  be  used  in  the  step-by-step 

time  Integration  previously  described.  Let  stage  (1)  be  completed  using, 

say,  the  scheme  (19).  Then  4  is  known  everywhere  except  on  o  ■  a*  and 

rn(a,t)  can  be  found  from  (25)  for  corresponding  values  of  a.  If  these 

values  are  substituted  into  (30),  using  a  formula  of  numerical  quadrature, 

this  allows  us  to  determine  r  (a*,t)  for  any  required  value  of  n.  In 

n 

practice,  of  course,  the  upper  limit  in  the  integral  in  (30)  is  replaced 

by  the  finite  value  a  corresponding  to  the  boundary  XY  in  Figure  3. 

in 

Having  found  rn(a*,t),  the  vorticity  on  the  boundary  a  ■  a*  can  be 

found  from  the  series  which  results  from  the  Inversion  of  (25),  viz. 

00 

C(a,8,t)  -  -H2  l  r  (a,t)  sin  nB.  (31) 

n-1 

We  thus  succeed  in  determining  t,  at  a  »  a*  without  explicitly  integrating 
the  equations  (24),  i.e.  stage  (iii)  is  completed  before  stage  (ii).  The 
essential  point  now  is  that  since  rn(a,t)  is  known  for  all  stations  a, 
Including  a  -  a*,  the  equations  (24)  can  be  Integrated  by  initial-value 
techniques  with  (26)  as  Initial  conditions.  From  the  computed  ffl  we  can 
calculate  iKa,8,t)  and  we  are  then  ready  to  enter  stage  (i)  again. 

The  treatment  of  the  closed  cell  problem  is  similar.  In  order  to 
preserve  comparability  with  the  analysis  just  given  we  shall  take  the 
specific  dimensions  0Z  -  it,  OX  ■  l  in  Figure  1.  With  the  change  in 
variables  of  x  for  a,  y  for  B  and  with  H  ■  1,  equations  (23),  (24)  and  (25) 
hold  good.  The  second  and  last  of  the  boundary  conditions  (6)  now  give 
rise  to  the  conditions 

f  ■  f*  ■  0  when  x  ■  0,1  (32) 

n  n 

for  the  functions  f  (x,t).  Equation  (29)  holds  if  we  put  o  -  x,  a*  -  0 

n 

and  now,  from  (32),  pR  ■  0  when  x  ■  i  and  hence 
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as  the  single  equation  (30)  in  the  previous  problem.  For  given  n,  each 

may  be  expressed  as  a  formula  of  numerical  quadrature.  Then,  given  rn(x,t) 

for  every  station  x  except  x  -  0  and  x  «  i,  two  simultaneous  equations  are 

obtained  to  determine  r  (0,t)  and  r  (f,t).  With  this  additional  information 

n  n 

it  is  possible  to  solve  the  equations  (24)  using  step-by-step  methods. 

In  practice  several  new  points  occur  in  this  problem.  One  is  that 

C(x,0,t)  and  4(x,r,t)  are  nonzero  and,  further,  they  will  be  unknown  at 

the  stage  when  rn(x,t)  is  evaluated.  If  conventional  methods  of  numerical 

quadrature  are  used  to  evaluate  r^(x,t),  no  additional  problem  occurs. 

The  integrand  of  the  Integral  in  equation  (25)  is  zero  at  the  end  points 

and  the  values  of  C  at  these  points  do  not  affect  t))e  integration.  However, 

it  is  found  that  specialized  formulae  have  to  be  used  for  the  integration 

and  this  does  create  difficulties  in  the  problem  of  flow  in  the  closed  cell. 

We  shall  return  to  this  later.  Another  point  in  this  problem  is  that  it 

is  not  possible  to  use  the  equation  (31)  to  determine  C  on  the  boundaries 

y  ■  0  and  y  -  it  even  when  r  (x,t)  is  known.  The  series  on  the  right  does 
J  n 

not  converge  to  the  function  on  the  left  at  these  points  unless  C  is  zero 
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there,  l.e.  we  get  the  Gibbs  phenomenon.  It  would  be  surprising  if  it 
were  possible  because  we  have  not  yet  utilized  the  slope  boundary  conditions 
for  on  y  ■  0,ir.  This  point  also  will  briefly  be  considered  later. 

NUMERICAL  ANALYSIS 

Two  problems  must  be  considered,  the  evaluation  of  the  integral  in 
(25)  just  mentioned,  and  the  solution  of  the  equations  (24)  by  step-by-step 
methods.  Each  presents  some  difficulty  if  treated  by  conventional  methods 
and  we  shall  consider  them  in  turn. 

The  problem  with  the  integral  on  the  right  side  of  (25)  is  that  unless 
a  very  fine  grid  is  used  in  the  S(or  y)  direction  the  result  of  conventional 
numerical  quadrature  will  be  Inaccurate  when  n  is  large  and  the  periodic 
function  has  many  zeros  in  the  range.  Filon  [l5j  pointed  this  out  and 
proposed  special  formulae  for  such  cases.  The  principle  is  based  on  poly¬ 
nomial  approximation  to  the  nonperiodic  part  of  the  integrand  only,  rather 
than  to  the  whole  Integrand,  as  would  be  done  in  conventional  methods. 

Filon  gives  details  when  the  approximating  polynomial  is  a  parabola  over 
three  successive  equally-spaced  values  of  the  integrand.  For  the  Integral 
on  the  right  side  of  (25)  the  result,  for  integration  over  the  points 
2,  0,  4  shown  in  Figure  4,  is 

(?4  cos  nS^-C2  cos 

{(344-  4Cq+  C2)sin  n&4  +  U4"  4Cq+  342)sin  n62) 

(t4~  2;q  +c2)(cos  nB4-  cos  nS2)  .  (37) 

8  ■  0  to  B  ■  tt,  assuming  an  even  number  of 


fB  2  1 

Sain  nB  dB^*— 

8 

4 

1 


2h1n 


_  1 


,  2  3 
h^  n 


The  integral  over  the  range 
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intervals,  Is  obtained  by  summing  this  result.* 

Filon  integration  allows  us  to  evaluate  rn(<X,t)  with  good  accuracy 
when  n  is  large  but  it  does  create  the  difficulty  that  the  function  t  needs 
to  be  known  at  the  end  points.  This  presents  a  problem  only  in  the  case 
of  the  closed  cell  flow.  We  shall  consider  this  point  later. 

Next  consider  the  solution  of  the  equations  (24)  using  step-by-step 

methods.  It  is  convenient  to  drop  the  subscripts  and  also  to  consider  the 

solutions  as  functions  of  a  single  variable,  say  x,  i.e.  we  consider  them 

at  fixed  time.  Thus  we  consider  the  single  equation 

f"  -  n2f  -  r(x)  (38) 

and,  to  start,  consider  the  solution  defined  in  the  finite  range  x  ■  0  to 

x  *  £.  with  the  conditions  (corresponding  to  (32)) 

f (0)  -  f ' (0)  -  0  (39a) 

.  f(£)  -  f'U)  -  0  .  (39b) 

We  assume  r(x)  to  be  given  at  all  grid  points  on  the  interval  x  0  to  x  ■£, 

including  the  end  points.  At  this  stage  also  we  shall  abandon  the  notation 

of  the  points  in  Figure  4  and  suppose  that  a  typical  grid  point  in  the  x- 

direction  is  denoted  by  x  ■  mh  (m  ■  0,1,2,...)  and  that  f  denotes  f(x  ). 

m  mm 

In  theory,  only  the  initial  conditions  (39a)  are  necessary  to  integrate 
(38)  as  a  step-by-step  problem.  However,  most  step-by-step  procedures 
applied  to  this  problem  are  unsatisfactory,  particularly  if  n  is  large. 

For  example,  the  direct  second-order  approximation  using  central  differ- 


*  For  further  details  of  this  result,  including  the  error  term,  see  the 
National  Bureau  of  Standards  Handbook  of  Mathematical  Functions  (seventh 
Printing,  May  1968)  p.  890  together  with  the  footnote,  p.  890.  The  form 
we  give  in  equation  (37)  is  equivalent  to  Filon' s  actual  form,  but 
expressed  differently. 
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ences  gives  the  approximation 

lm»  *  h’Vl  +  <2  +  <*» 

Assuming  some  starting  procedure  which  gives  f^,  equation  (AO)  can  be  used 
to  construct  an  approximate  solution  starting  from  m  ■  0,  but  an  elementary 
error  analysis  (see  practically  any  bextbook  dealing  with  this  subject) 
Indicates  that  the  error  propagation  is  unsatisfactory.  The  error  at  some 
fixed  value  of  x  is  unbounded  as  h  -*•  0  for  all  values  of  n.  Also,  for 
fixed  h  the  error  at  fixed  x  Increases  rapidly  with  n.  In  the  present 
application  it  is  obvious  that  h  will  remain  fixed  regardless  of  n  so  that 
this  latter  property  makes  the  use  of  a  formula  such  as  (AO)  unsuitable. 

To  some  extent  the  error  growth  is  associated  with  the  presence  of  the 
increasing  exponential  term  exp(nx)  in  the  complementary  function  of  (38). 
This  term  plays  little  part  in  the  required  solution  (in  the  cylinder 
problem  it  obviously,  from  the  conditions  (27),  plays  no  part  if  n  >  1) 
so  the  error  growth  is  unacceptable.  For  the  same  reason  any  attempt  to 
express  the  equation  (38)  as  two  simultaneous  first  -order  equations  with 
known  initial  conditions  leads  to  a  basically  unsatisfactory  system. 

As  an  example,  consider  the  pair  of  first-order  equations  defined  by 
using  the  functions  given  by  equations  (28)  and  (3A).  The  function  p(x) 
(using  the  notation  of  the  present  section)  satisfies  the  equation 


and  q(x)  satisfies 


p'  -  np  -  r(x) 


q'  +  nq  ■  r(x)  . 


The  initial  conditions  are  p(0)  -  q(0)  -  0.  If  we  express  the  first 
derivatives  by  simple  forward  differences,  an  approximation  to  (Al)  (the 
Euler  method)  is 


13A 


«3) 


W  '  (1  +  nh>  p*  +  hrn 

and  to (4 2) 

Vl  *  <1  -  ■!>)  q.  +  hrB  .  (U) 

The  error  growth  associated  with  (43)  and  (44)  can  be  discussed  following 
the  analysis  given  by  Forsythe  and  Wasow  £l&J  .  That  associated  with 
(43)  will  be  unacceptable,  but  that  associated  with  (44)  may  be  accept¬ 
able.  The  reason  is  that  the  unwanted  Increasing  exponential  part  of 
the  complementary  function  of  (38)  has  been  Isolated  in  (41)  and  the  de¬ 
creasing  part  in  (42). 

In  the  present  application,  however,  we  not  only  have  initial 
conditions  for  p(x)  and  q(x)  but  also  the  conditions  p(fc)  ■  q(£)  ■  0, 
i.e.  more  conditions  than  are  theoretically  needed.  With  the  aid  of  these 
a  satisfactory  procedure  can  be  constructed.  Equation  (42)  is  integrated 
in  the  increasing  x-direction  with  initial  condition  q(0)  »  0.  Equation 
(41)  is  Integrated  backwards  with  p(£)  -  0.  An  equivalent  way  of  looking 
at  the  latter  integration  is  to  put  x  ■  l  -  x'  in  (41)  whence  it  becomes 

dp/dx'  +  np  -  -r(£  -  x'),  (45) 

with  initial  condition 


p  ■  0  when  x'  ■  0.  (46) 

Equation  (45)  is  now  of  the  same  character  for  Increasing  argument  as 
equation  (42)  and  the  error  growth  is  at  least  tolerable. 

From  the  functions  p(x)  and  q(x)  computed  in  this  way,  f(x)  and  f'(x) 
are  obtained  from  the  equations 

f  -  (p  - 
f'-  (P 


-  q)/2n 
+  q)/2  .  J 


(47) 


Also,  a  highly  sensitive  check  on  the  numerical  procedure  can  now  be  made. 
It  is  that  f  and  f'  computed  from  (47)  must  now  come  out  to  be  zero 


135 


(within  an  acceptable  numerical  tolerance)  at  both  x  ■  0  and  x  ■  1. 

Moreover,  this  cannot  possibly  happen  unless  the  conditions  (33)  and  (36) 
have  been  correctly  satisfied.  The  check  therefore  tests  this  in  addition 
to  the  accuracy  of  the  step-by-step  process;  it  is  a  very  severe  one. 

It  is  now  clear  froii  the  above  that  we  may  restrict  further  consider¬ 
ation  to  the  question  of  constructing  an  approximation  to  equation  (42) 
with  the  initial  condition  q(0)  ■  0.  A  detailed  error  analysis  of  the 
approximation  (44)  indicates  that,  although  the  approximation  is  satisfactory 
in  a  manner  not  shared  by  the  approximation  (43) ,  it  cannot  be  expected 
to  be  effective  unless  nh  is  small  compared  with  1.  We  certainly  cannot 
expect  to  obtain  an  accurate  solution  unless  this  condition  is  satisfied. 
Since  h  is  fixed  and  n  may  be  large  this  will  not  in  general  be  satisfied. 

A  method  must  be  constructed  which  does  not,  on  the  whole,  lose  accuracy 
as  n  increases.  Such  a  method  is  suggested  by  the  principle  of  Filon 
quadrature . 

We  denote  as  before  the  value  x  -  mh  by  x  and  the  corresponding  value 

m 

q(x  )  by  q  .  Equation  (42)  may  be  written 
mm 

-r-  (e  q(x) }  -  e  r(x) 


whence 


q(x) 


.  e-n<x-x,n) 


Sn 


+  e 


-nx 


rx 


e"Cr(0 


dt. 


(48) 


m 

If  x  -  x  +  kh,  this  formula  is  a  multi-step  formula  for  the  step-by-step 
m 

integration  of  equation  (42) .  The  integral  extends  over  k  steps  and  we 
suppose  in  the  usual  way  that  r(x)  is  approximated  by  a  polynomial  P(x) 
passing  through  some,  or  all,  of  the  corresponding  k+1  points  and  perhaps 
also  through  points  outside  the  range.  Let  the  error  of  this  approximation 
be 


E(0  -  r(C)  -  P(0  . 


(49) 
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Let  q*(x)  satisfy  (48)  when  r(x)  is  approximated  by  P(x)  and  let 

e(x)  -  q(x)  -  q*(x)  (50) 

denote  the  error.  Then 

e(x)  -  e“n(x‘xo)  t  +  T(x)  ,  (51) 

in 

where  T(x)  is  the  truncation  error  and  is  giver  by 

T(x)  -  e'nx  f  en*  E(0  d£  .  (52) 

J*m 

Various  estimates  of  T(x)  can  be  made,  for  example,  a  crude  estimate 

is  obtained  by  assuming  only  that  r(x)  is  continuous  on  the  Interval 

(x  ,x).  By  definition  E(x)  is  continuous  on  the  same  interval  and  thus 
m 

T(x)  -  E(^)  {1  -  e'n(x_x«))/n,  (53) 

where  x  <  <  x.  Thus  T(x)  cannot  exceed  the  maximum  error  of  the 

m  1 

interpolating  polynomial  for  any  value  of  n. 

Only  the  cases  k  ■  1,  k  ■  2  will  be  considered  in  detail  as  these 
are  of  the  most  value  in  practical  Integrations.  The  case  k  •  1  gives  a 


propagation.  For  given  h,  y  <  1  and  the  error  after  m  steps  is  bounded 


as  m  -*■  •.  Moreover,  as  h  -*•  0  the  error  at  a  fixed  value  x  ■  mh  remains 
bounded  since  ym  -*■  e  nx.  For  convergence,  suppose  xq  is  the  initial  point 
x  ■  0  and  that  the  initial  condition  (actually  q  ■  0  in  the  present 
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application)  has  been  calculated  exactly,  Then  tQ  ■  0  and  convergence 

is  established  if  S  -*•  0  as  h  -*■  0  for  fixed  x  ■  mh.  Clearly  S  -*■  0  if  T  -►  0. 

o  m 

From  (53),  for  all  m, 


Vi '  hE«i> 


and  E(^)  certainly  remains  bounded.  Thus  T  -*■  0. 
known,  if  P(x)  passes  through  the  j  +  1  points 


Actually,  as  is  well 


V  xi+r  ••••*  Xi+j 

and  r(x)  possesses  continuous  derivatives  up  to  order  j+1  on  an  interval 
which  contains  these  points  and  the  argument  x,  then 

E(x)  -  (x  -  Xl)(x  -  x1+1)...(x  -  xi+j)r(J+1)(n)/(j+l):  (57) 

where  n  lies  on  the  given  interval.  In  this  case  E(C^)  ■  0(h^+*)  as 
h  -+■  0. 

The  formula  for  q*  for  the  one-step  case  is 


q*+l  -  e"nhq£  +  e"nX"W-l  J*"*1  enCP(OdC.  (58) 

xm 

The  simplest  form  for  P(S)  is  to  take  it  constant,  i.e. 

P(0  “  r(x  )  -  r  . 

ui  Ill 

This  corresponds  directly  to  the  Euler  formula  in  the  approach  using 
standard  difference  methods.  The  approximation  obtained  from  (58)  is 

q*tl  ■  e  nhq£  +  rn(l  -  e  nh)/n  .  (59) 

If  we  assume  the  expression  (57)  for  the  error,  with  i  -  m  and  ]  ■  0  in  this 
case,  substitution  in  (52)  yields 

T  -  e  n%m¥l  [XnrflU  -  x  )r'(n)  e"Cd£  .  (60) 

m+1  J  m 

'x 

m 

If  we  apply  the  mean  value  theorem  then 

w  ■  «"”vi  *•<»!>  <M) 

m 
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where  x  <  n,  <  x_  . , .  We  can  deal  with  (61)  in  two  ways,  firstly  by 
m  l  o+i 

evaluation  of  the  Integral,  which  yields 


r'(r),)  nh 

Vi  •  -r*- ,h  - (1  -  «  >'">  • 

Alternatively,  we  can  again  employ  the  oean  value  theorem  to  give 

n(?l  -  x_^n) 


(62) 


m+1 


1 
2  6 


m+1  .2  , ,  . 

h  r'^)  , 


(63) 


where  x  <  £,  <  x  ...  The  two  forms  clearly  become  equivalent  if,  for 
m  1  m+1 

fixed  n,  h  ■+•  0.  However,  in  general  they  display  different  properties. 

Equation  (62)  shows  that  the  truncation  error  decreases  with  u.  Equation 

2 

(63)  Indicates  that,  regardless  of  n,  the  truncation  cannot  exceed  h  r'(n,)/2, 
which  is  what  it  would  be  if  the  Euler  method  were  applied  to  Integrate  the 
basic  differential  equation  (42)  with  n  •  0. 

A  similar  situation  exists  if  P(x)  is  taken  as  the  straight  line 


joining  x  to  x. . .  The  truncation  error  is  then 
in  htti 


-nx 


m+1 


m+1 


m+1  2 ! 


(C  -  xJU  -  xjttfl)r"(n)e  dC 


m 

and  it  is  easily  shown  that  T^  cannot  exceed  in  magnitude  the  term 
3 

-h  r"(n,)/12,  where  x  <  n,  <  x  . . .  This  term  gives  the  truncation  error 
1  ml  m+1 

when  trapezoidal  integration  is  applied  to  the  special  case  n  ■  0  of 
equation  (42).  We  can  clearly  write  down  a  formula  of  similar  type  for 
the  truncation  error  when  P(x)  is  a  polynomial  of  any  degree.  However, 
the  general  investigation  of  this  error  is  then  more  difficult.  In 
particular,  the  term  E(x)  changes  sign  over  the  range  of  integration  and 
the  process  of  getting  simple  estimates  of  the  kind  Just  given  is  more 
involved.  Although  the  point  has  not  been  investigated  fully  it  does 
seem  to  be  possible,  following  methods  used  by  Steffensen  [l7^]  in  the 
theory  of  numerical  quadrature,  to  establish  the  following  result.  For 
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any  given  polynomial  approximation  P(x),  the  magnitude  of  the  truncation 
error  in  approximating  (48)  does  not  exceed  the  magnitude  of  the 
corresponding  truncation  obtained  when  n  is  put  equal  to  zero  in  this 
equation. 

We  turn  finally  to  the  actual  formulae  utilized  in  the  numerical 

studies  of  this  paper.  Two  formulae  have  been  used,  the  first  a  one-step 

formula  to  effect  the  first  step  of  the  integration  from  the  initial  point 

xq  to  the  point  x^,  but  using  parabolic  approximation  to  r(x)  over  the 

points  xq,  x^,  Xj.  The  second  formula  used  was  the  two-step  formula 

obtained  by  putting  x  ■  x^  +  2h  in  (48)  and  using  parabolic  approximation 

for  r(x)  over  the  successive  points  x  ,  x  , ,  x _.  The  appropriate 

m  tn+l  vttrZ 


formulae  for  q*  are  as  follows.  For  the  first  of  these  two 

rl 


q*  .  yq*  +  I  (r^  yrj - L.  {r<)_  r__  y(4ri-  3r_-  r0)} 


2hn 


2  li2  o 


o  2; 


+  (V  2Vr2)(H)' 

h  n 

where  y  is  as  previously  defined.  For  the  second 


(64) 


•C+2  ’  <4  +  £(rmt2'  T*r»)-  ,(W.>-  '‘'Vl'Vrt” 


(r  -2r  . ,+r  .  „) (1-y*). 


.2  3  '  m  m+1  m*2' 
h  n 


(65) 


-2nh 


Here  y*  -  e 

The  resemblance  between  (65)  and  the  Filon  quadrature  formula  (37)  is 
obvious . 

The  question  of  error  propagation  for  two-step  formulae  such  as  (65) 
may  be  considered  in  a  similar  manner  to  that  in  the  one-step  case.  The 
error  equation  is 

S*2  ’  y*cm  *  W  (66) 

where  T^  is  given  by  putting  x  ■  xm)  „  in  (52) .  Actually  the  only  new 
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point  is  that  (66)  has  two  Independent  solutions,  one  for  even  values  of 
m  generating  from  the  initial  value  Cq  *nd  the  other  for  odd  values  of  m 
generating  from  e^.  The  general  error  propagation  properties  of  these 
two  solutions  are  essentially  similar  to  those  discussed  in  the  one-step 
case  and  need  not  be  considered  further.  In  practice  will  be  determined 
by  the  starting  approximation  to  q.  This  can  be  obtained  using,  for 
example,  (64)  or  any  other  one-step  formula.  Generalization  of  these 
results  to  the  multi-step  case  is  obvious  and  we  shall  give  no  further 
discussion  of  this. 

The  truncation  error  T^  in  (64)  is  obtained  by  putting  m  •  0,  x  ■  x^ 
in  the  formula  (52).  Also  E(x)  is  given  by  (57)  with  i  -  0,  J  -  2.  The 
polynomial  part  of  E(x)  does  not  change  sign  over  the  range  of  integration 
so  it  is  easy  to  deduce  that  T^  •  e"^1  h^h^r'"  (n1)/24,  where  0  <  <  h 

and  0  <  <  2h.  For  the  two-step  formula  (65),  E(x)  is  given  by  putting 

i  -  m,  j  ■  nrt-2  in  (57).  In  this  case  the  polynomial  part  of  E(x)  does 
change  slgr  over  the  range  of  integration.  We  have  not  investigated  this 
case  fully  but,  as  previously  noted,  it  is  believed  that  the  truncation 
can  be  shown  to  be  not  greater  in  magnitude  then  that  obtained  by  putting 
n  ■  0  in  (48) .  In  this  case  this  would  give  the  truncation  not  greater 
in  magnitude  than  that  of  the  Simpson  formula  applied  to  the  function  r(x). 

Finally  we  give  some  results  of  a  specific  numerical  illustration  of 
the  whole  computational  procedure  of  solving  equation  (38)  subject  to  the 
boundary  conditions  (39a)  and  (39b).  Some  computations  were  carried  out 
to  test  the  method.  In  this  test  r(x)  was  taken  to  be 

r(x)  ■  100  +  ae  +  be 

with  a(n)  and  b(n)  chosen  to  satisfy  the  necessary  conditions  (33)  and  (36). 
The  range  of  Integration  was  taken  as  i  -  1.  Equation  (42)  was 
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Integrated  In  the  positive  x-directlon  and  equation  (41)  In  the  negative 
x-dlrectlon  as  already  described.  Both  Integrations  were  carried  out 
using  formulae  of  type  (64)  and  (65),  equation  (64)  being  used  for  the 
first  step  and  (65)  being  used  to  continue  the  Integration. 

Computations  were  carried  out  for  integer  values  of  n  from  n  ■  1  to 
n  "  20.  For  each  case  two  approximate  solutions  were  obtained  using 
h  ■  0.1,  0.05  respectively  and  the  results  compared  with  the  known  exact 
solution.  This  comparison  was  found  to  be  extremely  satisfactory.  In 
Table  1  some  comparisons  are  given  for  one  value  only,  the  value  f(0.5), 
but  on  the  whole  the  result  is  representative  of  those  obtained  over  the 
complete  range.  The  values  are  given  in  floating  decimal  form. 


n 

Exact 

h  -  0.1 

h  -  0.05 

1 

1.87926E-01 

1.87641E-01 

1.87925E-01 

5 

1.19699E-01 

1.19699E-01 

1.19699E-01 

10 

5.65582E-02 

5.65622E-02 

5.65584E-02 

15 

3.04646E-02 

3.04671E-02 

3.04647E-02 

20 

1.86811E-02 

1.86827E-02 

1.86811E-02 

TABLE  1. 

Values  of  f(0.5) 

Apart  from  the  very  curious  feature  exhibited  by  the  case  n  ■  5,  the 
evidence  of  this  table  is  that  the  approximation  improves,  as  n  increases; 
most  certainly  it  does  not  get  worse.  We  may  summarize  some  other 
features  of  the  numerical  solutions  as  follows.  For  the  case  h  ■  0.05, 
the  solutions  computed  using  the  step-by-step  process  agree  with  those 
computed  from  the  exact  solution  to  about  6  decimal  places  (not  significant 
figures)  at  every  grid  point  for  every  value  of  n  ■  1  to  n  -  20.  In  this 
case  also,  the  check  that  f(0)  and  f(l)  should  both  come  out  zero  in  the 


142 


final  solution  was  satisfied  to  at  least  a  tolerance  of  10  \  and  more 

often  10  ^  or  10  \  When  h  ■  0.1,  this  tolerance  was  rather  greater, 

-4  -6 

varying  from  10  to  10  .  In  both  cases  this  check  improves  as  n  in¬ 

creases,  giving  further  evidence  that  the  accuracy  improves. 

Finally,  we  must  discuss  briefly  the  solution  of  (38)  in  the  case 
whi.ch  corresponds  to  flow  past  a  cylinder.  Although,  in  practice,  the 
range  of  integration  is  still  restricted  to  be  a  finite  number  l (correspond¬ 
ing  to  the  imposed  finite  boundary  XY  in  Figure  3) ,  in  theory  this  range 
is  infinite  in  this  problem.  The  new  point  is  that  the  boundary  conditions 
(39b)  are  not  available  now.  Although,  as  we  have  already  pointed  out, 
only  the  conditions  (39a)  are  in  theory  required  to  solve  the  step-by-step 
problem,  obviously  some  replacement  for  (39b)  must  be  found  if  the  methods 
just  described  are  to  be  used.  No  new  problem  exists  for  the  determination 
of  q(x)  (note  that  x  is  used  for  a  in  this  section).  The  condition  q(0)-0 
still  holds.  We  do,  however,  require  to  know  p(£)  to  initiate  the  backward 
integration  of  (41). 

To  consider  the  problem  in  detail  we  should  first  examine  the  nature 

of  the  time-dependent  solution  of  (10)  for  large  a  in  order  to  ensure  that 

the  derivation  of  r  (a,t)  from  (25)  is  such  that,  as  a  +  »,  the  integral 

n 

in  the  condition  (30)  converges  at  its  upper  limit.  This  ensures  that  the 
problem  is  properly  posed.  Actually  this  could  be  done  by  linearizing  the 
equation  (10)  with  the  outer  boundary  conditions  (15),  following  the  usual 
manner  of  Oseen,  and  obtaining  the  exact  solution  to  this  linearized 
problem.  The  problem  has  already  been  considered  in  the  steady  case 
Os/3t  -  0  in  (10))  for  flow  past  a  flat  plate  in  the  paper  by  Dennis  and 
Dunwoody  already  cited.  The  treatment  of  the  time-dependent  case  considered 
here  is  similar  but  much  more  involved  and  will  not  be  considered.  It  will 
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be  assumed,  as  la  certainly  the  case  In  steady  flow,  that  the  behavior  of 
rn(a,t)  as  a  •  Is  such  that  the  Integral  In  (30)  converges.  To  find  an 
approximation  to  pQ(l)  *t  a  finite  t  we  proceed  as  follows. 

The  complementary  function  of  (41)  Involves  the  term  exp(nx).  Prom 
the  boundary  conditions  (27)  this  term  can  only  appear  In  the  required 
solution  when  n  ■  1.  Leaving  aside  this  case,  we  write  (41)  In  the  form 

np  ■  p'  -  r  (67) 

and  then,  for  some  large  enough  value  x  ■  1,  take 

p(l)A< -r(t)/n  (68) 

as  an  approximation  to  (67).  This  Is  a  valid  approximation  provided  that 
It  makes  p'(l),  viz.  -r'(l)/n,  small  compared  with  r(l).  If  this  Is 
satisfied,  a  second  approximation  Is  obtained  from  (67)  in  the  form 

p(£)»-  ^  (r(l)  +  r'(t)/n)  (69) 

2 

provided  that  r"(l)/n  in  small.  The  process  may  be  continued  as  long  as 
It  remains  valid.  The  case  n  -  1  differs  only  slightly.  In  this  case  It 
follows  from  (27)  that  the  complementary  function  of  (41)  contributes  a 
leading  term.  From  the  definition  of  p(x)  this  is  seen  to  be  2ke  .  The 
approximations  (68)  or  (69)  come  from  a  particular  solution  which  does 

l 

not  contain  the  complementary  function.  Thus  when  n  ■  1,  the  term  2ke  must 
be  added  to  the  right  side  of  (68)  or  (69) .  Actually  the  case  n  ■  1  is 
not  of  much  consequence  since  practically  any  standard  step-by-step  process 
will  succeed  In  this  case. 

This  method  of  approximating  p(l)  is  suggested  by  the  fact  that,  In 
the  case  of  steady  flow  (cf  Dennis  and  Dunwoody), 

r  (x)  -*•  nc,  as  x  -*•  •, 

where  c  is  a  constant  which  can  be  expressed  In  terms  of  the  drag  exerted 
by  the  fluid  on  the  cylinder.  It  is  certain  In  this  case  that  if  l  is 
taken  large  enough  the  method  will  yield  a  valid  approximation.  In  the 
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general  tine-dependent  case  the  accuracy  of  the  approximation  must  be 
judged  according  to  the  results  achieved.  Finally,  one  important  factor 
Influences  the  question  of  the  accuracy  of  solutions  obtained  using 
approximations  such  as  (68)  and  (69)  to  the  initial  condition  for  the 
integration  of  (41) .  The  error-propagation  of  the  backward  integration 
of  (41)  is  identical  with  that  of  the  forward  integration  of  (42).  The 
particular  discussions  of  equation  (51)  already  given,  and  their  obvious 
generalization,  indicate  that  an  error  in  the  initial  condition  will 
rapidly  damp  out.  Moreover,  the  higher  the  value  of  n,  the  more  rapidly 
the  damping  will  be.  Thus  errors  introduced  into  p(£)  by  approximations 
of  the  type  considered  rapidly  lose  their  effect  and  there  exists  an 
inner  range  0  <  x  <  £',  (£'<!),  where  the  accuracy  is  virtually  inde¬ 
pendent  of  the  approximation  to  p(£). 

SOME  PRACTICAL  DETAILS 

We  now  deal  with  some  points  which  have  not  yet  been  considered.  For 
example,  in  the  case  of  flow  past  a  cylinder,  the  condition  C  «  0  is  a 
crude  approximation  to  apply  at  the  boundary  XY  (Figure  3)  unless  this  is 
far  from  the  axis  a  ■  a*.  It  is  valid  in  the  early  stages  of  the  motion, 
before  appreciable  vorticity  has  had  time  to  diffuse  from  the  cylinder. 

At  much  later  times,  however,  after  the  wake  has  built  up  behind  the 
cylinder,  it  is  found  that  vorticity  of  appreciable  magnitude  extends  far 
downstream.  The  approximation  ?  ■  0  on  the  boundary  XY  is  then  a  poor  one. 

If  we  assume  that  for  large  time  the  flow  tends  to  a  steady  state, 
the  limit  flow  is  found  by  putting  3c/3t  ■  0  in  (10)  and  then  G  =  t(a,B). 
For  large  values  of  a,  the  conditions  (15)  are  applicable.  Thus  the  limit 
form  of  the  equation  (10)  for  t  and  a  both  large  can  be  written 
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(70) 


-  Rke°(co86  |jj-  -  sinB  |j)  . 

A  complete  solution  of  this  equation  which  satisfies  the  necessary  conditions 

that  C  -  0  when  B  ■  0  and  8  ■  x  can  be  verified  to  be 

;(a,B)  -  eXC08B  \  A  My)  sin  nB,  (71) 

,  n  n 
n-1 

where 

X (a)  -  Rkea/2 

and  K  Is  the  modified  Bessel  function  of  the  second  kind.  The  constants 
n 

A  (n  ■  1,2,3,...)  are  arbitrary, 
n 

Since  x  is  large  when  a  Is  large  and  the  leading  term  of  the  asymptotic 
expansion  of  Kr(x)  for  large  x  is 

Kn(x)  *  (2/xx)1*  e"\ 

which  is  independent  of  n,  then  for  large  a  equation  (71)  can  be  replaced 
by 

? (a, 8)  'v  G(8)  ex(c08S  "  1)_a/2  ,  (72) 

where  G(8)  is  a  function  of  8  alone.  Equation  (72)  gives  the  character  of 

the  steady-state  vorticity  distribution  for  large  a,  viz.  it  is  exponentially 

small  except  in  a  region  near  B  ■  0  which  becomes  narrower  as  a  increases. 

If  the  flow  is  steady,  equation  (72)  can  be  used  as  a  valid  boundary 

condition  (assuming  a  large  enough)  on  the  boundary  XY  in  Figure  3.  If 

a  *  a  denotes  this  boundary  then  (72)  gives 
in 

C(a,8)  c(a  ,8)  exp{ (x  -  x_)(cosB  -  l)-(a  -  a  )/2)  .  (73) 

m  lu  In 

Here  x  denotes  the  value  of  x  at  a  ■  a  .  Actually  the  treatment  given 
m  n 

here  is  essentially  similar  to  that  described  in  a  recent  paper  by  Dennis, 
Hudson  and  Smith  [l 8 Jin  a  problem  of  steady  heat  convection,  although  the 
details  are  not  precisely  the  same. 

For  the  general  time-dependent  case  we  cannot  assert  that  (73)  is 
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strictly  valid  owing  to  the  presence  of  the  nonzero  3c/3t  term  in  (10). 

However,  the  following  argument  is  reasonable.  If  the  solution  approaches 

a  steady  state  then  (73)  is  a  good  approximation  for  moderate  and  large 

times,  since  3c /3t  will  be  small.  In  the  earlier  stages  of  the  motion  the 

vorticity  in  the  external  field  is  anyway  small.  In  any  intermediate  stage, 

the  use  of  (73)  can  hardly  be  worse  than  assuming  C  ■  0,  for  it  gives  the 

general  exponential-type  decay  which  is  inherent  in  the  problem.  Thus  at 

a  given  time  t,  (73)  may  be  assumed  to  give  c(ot  +h,8,t)  from  c(a  ,B,t) 

m  m 

for  all  values  of  8  corresponding  to  the  grid  points  on  XY.  Then  (19)  may 

be  used  to  calculate  c(a  ,8,t  +  At)  on  XY. 

m 

One  other  point  which  requires  some  explanation  is  the  question  of 
how  the  solution  of  the  problem  of  convection  in  the  closed  cell  may  be 
completed.  It  has  already  been  noted  that  some  difficulties  arise  in  this 
problem,  especially  if  the  Filon  quadrature  formula  (37)  is  used  to  cal¬ 
culate  r  (x,t).  It  must  be  admitted  that  the  method  is  less  satisfactory 
n 

in  this  case,  but  we  may  proceed  as  follows.  Suppose  the  solution  every¬ 
where  has  been  completed  through  time  t.  Equation  (19)  then  gives 

C  (t  +  At)  at  every  internal  grid  point.  Normally  At  is  small  to  ensure 
o 

stability  of  the  process,  so  the  solution  for  CQ(t  +  At)  is  not  radically 

different  from  c  (t).  The  new  internal  solution  is  then  used  with  the  old 
o 

values  (at  time  t)  of  ?  on  y  ■  0,it  to  evaluate  rn(x»t  +  At).  The  equations 
(24)  can  now  be  solved  and  the  solution  for  4>(x,y,t  +  At)  at  all  internal 
grid  points  computed  from  (23).  The  solution  will  not  be  exactly  right 
because  we  have  used  CQ(t)  on  y  ■  0,it.  It  will,  however,  be  a  very  good 
approximation  and  it  can  now  be  introduced  into  the  finite-difference 
equations  (18)  and  improved  using  one  of  the  usual  Iterative  schemes. 

The  iterative  solution  of  the  Poisson  equation  (3)  is  not  entirely  eliminated 
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In  this  problem,  but  la  greatly  reduced.  From  the  final  solution, 
boundary  values  of  C  on  the  boundary  OZ  (Figure  1)  are  calculated  from 
(22)  with  similar  equations  on  the  other  boundaries.  This  avoids  the 
use  of  (31)  and  also  ensures  that  the  slope  boundary  condition  is  utilized 
on  all  the  boundaries. 

CALCULATIONS  OF  THE  FLOW  PAST  A  CIRCULAR  CYLINDER 

To  illustrate  the  method  we  give  brief  details  of  some  calculations 
of  the  flow  past  a  circular  cylinder.  The  details  of  the  transformation 
(8)  are  those  given  by  the  equations  (11)  and  (12).  The  dimensionless 
quantities  ip  and  (  are  related  to  the  corresponding  dimensional  quantities 
♦  '  and  C'  by  the  equations  ip'  ■  Uap,  -  (U/a)C.  Here  a  Is  the  radius  of 
the  cylinder.  The  Reynolds  number  in  (10)  is  R  ■  Ua/v,  where  U  is  the 
constant  stream  velocity.  Actually  it  is  more  usual  to  work  in  terms  of 
the  Reynolds  number  R*  -  2Ua/v,  and  we  shall  adhere  to  this.  The  motion 
was  started  impulsively  from  rest  and  the  calculations  continued  until  a 
steady  state  was  reached. 

For  a  given  Reynolds  number  there  are  several  parameters  which  can  be 

varied.  Since  the  object  of  the  calculations  is  to  illustrate  the  method, 

some  of  these  have  been  chosen  to  be  consistent  with  those  used  in  previous 

studies  in  the  literature.  This  applies  to  the  spatial  grid  sizes  h  and 

h. ,  the  time  step  At,  and  the  value  a  defining  the  position  of  the 
1  m 

boundary  XY.  In  the  present  method  the  finite  number  nQ  of  terms  taken 

to  approximate  the  infinite  series  (23)  is  also  a  parameter.  For  each 

Reynolds  number,  separate  calculations  were  carried  out  for  at  least  two 

values  of  n  .  The  details  of  the  parameters  used  in  the  calculations  are 
0 

given  in  Table  2. 
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R* 

At 

h-h1 

a  h 
m 

n 

0 

7 

0.03 

ir/20 

1 

5,10 

10 

0.04 

o 

CM 

1 

5,10 

20 

0.08 

w/20 

1 

5,10 

40 

0.04 

it/40 

1 

5,10,20 

TABLE  2.  Parameters  of  calculations 

The  influence  on  the  solutions  of  the  number  n  given  in  this  table 

o 

can  be  summarized  as  follows.  For  each  of  the  cases  R*  *  7,10,20  there  was 

some  change  in  the  features  of  the  solution  from  n  *'  5  to  n  *  10  but  it 

o  o 

was  not  great,  although  perhaps  rather  greater  for  R*  -  20.  For  R*  »  AO 

there  was  a  somewhat  greater  change  from  nQ  ■  5  to  nQ  ■  10  but  a  relatively 

insignificant  one  fr*  n  n  ■  10  to  n  -20.  For  this  latter  reason  solutions 

o  o 

for  nQ  -  20  were  not  obtained  for  the  lower  Reynolds  numbers.  Actually 
there  is  little  doubt  that  as  the  Reynolds  number  increases,  more  terms  of 
(23)  are  required  to  represent  the  stream  function  accurately,  particularly 
for  the  late  time  and  steady  solutions,  when  the  wake  has  fullv  developed. 
This  is  essentially  the  reason  for  the  inadequacy  of  the  higher  Reynolds 
number  solutions  given  by  Dennis  and  Shimshoni  [l9j  in  a  study  of  the 
steady  flow  problem.  In  this  study  a  similar  series  to  that  used  in  the 
present  paper  was  used  for  the  stream  function  and  also  a  series  was  adopted 
for  the  vorticity. 

Most  previously  published  material  is  concerned  with  the  steady  flow 

and  we  shall  give  details  only  of  the  final  steady-state  solutions  obtained 

for  the  greatest  value  of  n  used  in  each  case.  In  Figures  5,6,7  and  8  the 

o 

steady  streamlines  are  shown  for  the  four  cases.  All  patterns  are  reason¬ 
ably  consistent  with  previously  known  results.  Figure  6  is  consistent 
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with  the  results  of  Keller  and  Takami  and  Figure  8  with  those  of  Kawaguti 
[20^  and  of  Apelt.  Actually  the  length  of  the  separated  region  in  Figure  8, 
measured  from  the  rear  generator,  is  about  2.5  diameters.  Apelt  obtained 
2.13  diameters  and  Kawaguti  about  2  diameters  for  this  quantity  in  their 
investigations  of  steady  flow.  Kawaguti  and  Jain  give  about  2.76  diameters 
in  their  limiting  solution  for  time-dependent  flow.  For  the  angle  of 
separation  at  R*  ■  40  we  obtain  about  54°.  Estimates  in  the  three  invest¬ 
igations  cited  vary  from  50°  to  53.7°.  Our  value  of  29°  for  the  angle  at 
R*  ■  10  seems  to  agree  well  with  Keller  and  Takami' s  result.  The  dis¬ 
tribution  of  vorticity  over  the  surface  of  the  cylinder  is  shown  for  the 
four  cases  in  Figure  9.  Again  the  results  seem  to  be  in  agreement  with 
previous  results,  where  comparisons  are  possible. 

Distributions  of  pressure  over  the  surface  of  the  cylinder  are  shown 
in  Figure  10.  The  quantity  plotted  is  the  dimensionless  pressure  coefficient 

p(8)-p#e 

C(8)  -  - -j—  ,  (74) 

\  PIT 

where  p(B)  is  the  pressure  at  given  angle  8  on  the  surface  and  rB  is  the 
constant  pressure  at  large  distances  from  the  cylinder.  According  to 
Roshko  [2lJ,  the  coefficient  C(0)  at  the  rear  stagnation  point  should,  for 
high  enough  Reynolds  number,  vary  according  to  the  law 

C(0)  <v  AR*->S  ,  .  (75) 

where  A  is  constant.  From  the  present  solutions  we  obtain  the  results 
Indicated  in  Table  3. 

R» _ 7 _ 10 _ 20 _ 40_ 

C(0)  -0.913  -0.770  -0.577  -0.528 

A  -2.41  -2.43  -2.58  -3.34 

TABLE  3.  Pressure  coefficient  at  8  ■  0 
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The  quantity  A  In  this  table  does  not  appear  to  be  approaching  a  definite 
constant  as  R*  increases,  which  we  might  expect  if  the  result  (75)  were 
valid.  However,  perhaps  the  Reynolds  numbers  are  a  little  small  for  any 
definite  conclusions  to  be  made. 

The  drag  force  D(t)  exerted  by  the  fluid  on  the  cylinder  is  usually 
expressed  in  terms  of  a  dimensionless  drag  coefficient  C^(t)  defined  by 

Cp(t)  -  D(t)/pU2a. 

It  can  easily  be  shown  in  the  present  case  of  a  circular  cylinder  (see 

e.g.  Kawagutl  and  Jain  [[8]  that 

CD(t)  "  ‘  t*  [  V»ln6d6  +  t*  [  (3C/3n>0  sinBdB,  (76) 

'  o  '  o 

where  the  subscript  here  denotes  values  at  a  ■  0.  The  first  term  on  the 

right  gives  the  friction  drag  coefficient  and  the  second  the  pressure 

drag  coefficient,  denoted  respectively  by  and  C^.  It  may  be  observed 

that  very  simple  expressions  can  be  obtained  for  these  quantities  in  the 

2  —2a 

present  method.  From  (31),  noting  that  H  -  e  ,  we  obtain 

CDf(t)  -  2*^(0, t)/R* 

and 

CDp(t)  -  -  2*{rJ(0,t)  -  2r1(0,t)}/R*, 
where  the  prime  denotes  differentiation  with  regard  to  a. 

Some  values  of  the  drag  coefficients  in  the  limit  solution  for  large 
t  are  shown  in  Table  4.  They  are  assumed  to  be  thosfe  appropriate  to  the 
limit  t  •,  since  they  had  approximately  converged  to  steady  values. 
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R* 

CDf<-> 

So'*’ 

cD(.) 

7 

1.565 

1.882 

3.AA7 

10 

1.2A7 

1.599 

2.8A6 

20 

0.F90 

1.195 

1.985 

A0 

0.52A 

0.999 

1.523 

TABLE  A. 

Steady  drag  values 

On  the  whole  they  agree  well  with  previous  estimates.  Keller  and  Takaml 

give  the  total  drag  coefficient  as  2.7A  at  R*  ■  10.  For  R*  «•  AO,  Apelt 

gives  Cjj(«)  m  1.A96  and  Kawagutl  ■  1.618. 

The  above  selection  of  results  Indicate  that,  for  the  cases  considered, 

results  of  accuracy  comparable  with  those  In  the  literature  can  be  obtained. 

Actually  solutions  for  R*  ■  70  and  R*  ■  100  have  also  been  attempted. 

Firstly,  R*  ■  100  was  attempted  using  the  same  grid  sizes  as  for  R*  -  A0. 

A  steady  state  solution  was  obtained,  but  a  distorted  one  which  indicated 

quite  unrealistic  properties  and  which  were  obviously  due  to  poor  finite- 

difference  approximations.  The  same  would  most  surely  have  been  true  of 

the  limiting  solution  at  R*  ■  100  in  Kawagutl  and  Jain's  work,  had  they 

not  discontinued  the  calculation  at  an  early  time,  for  they  used  a  grid 

size  of  only  ir/30.  A  solution  for  R*  ■  70  was  then  attempted  taking 

h  ■  h.  ■  ir/60.  This  was  successful  and  was  clearly  approaching  a  realistic 
1  * 

solution  for  large  times.  A  similar  solution  was  then  attempted  for 
R*  -  100  with  h  ■  h^  -  w/60.  This,  too,  was  successful.  However,  the 
approach  to  the  final  solution  In  the  wake  was  so  slow  that  both  solutions 
were  discontinued  after  a  large  number  of  time  steps.* 

*  Both  have  since  been  completed  using  steady-state  methods  and  will 
shortly  be  published. 
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SUHMARY  AND  CONCLUSION 


A  method  has  been  described  which  allows  certain  classes  of  time- 
dependent  integrations  of  the  Navler-Stokes  equations  to  be  treated 
entirely  by  initial-value  techniques.  The  main  object  has  been  to  deal 
with  the  problem  of  flow  past  cylinders.  One  of  the  eMects  was  the  hope 
that  the  introduction  of  initial-value  techniques  would  speed  uj.  the 
Integrations.  As  yet  it  is  too  early  to  say  how  well  this  has  been 
achieved,  but  one  example  can  be  quoted.  Kawaguti  and  Jain  give  a 
typical  time  of  2  hours  CDC  3600  machine  time  required  to  complete  their 
study  of  the  flow  past  a  circular  cylinder  at  R*  ■  40,  using  grid  sizes 
h  -  h^  ■  ir/30.  A  similar  study  on  the  same  machine  using  present  methods 
with  the  same  h  and  h^  (which  are  actually  rather  too  large  to  give  a 
very  good  solution  at  this  Reynolds  number)  took  less  than  half  an  hour. 
However,  comparisons  are  difficult  because,  naturally,  much  depends  upon 
the  efficiency  of  the  particular  iterative  method  used  to  solve  the 
Poisson  equation  if  boundary-value  techniques  are  used. 

Actually,  if  steady-state  solutions  are  the  object  of  the  integration, 
it  has  been  found  more  satisfactory  to  use  steady-state  methods  (i.e.  solve 
(10)  with3c/3t  *0).  It  is  quite  easy  to  adapt  the  above  method  to  this 
case.  In  a  recent  project  carried  out  jointly  at  the  Mathematics  Research 
Center  and  the  University  of  Western  Ontario  a  number  of  problems  involving 
steady  flow  past  cylinders  have  been  solved.  These  will  be  published 
subsequently. 
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FIG, 6 
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